Fix #16344: graceful HTTP/2 GOAWAY migration to eliminate request failure window#16345
Fix #16344: graceful HTTP/2 GOAWAY migration to eliminate request failure window#16345devjue wants to merge 6 commits into
Conversation
…st failure window Keep the old channel alive after a GOAWAY frame and bring up a new connection asynchronously on the connectivity executor; atomically swap the channel reference via a closeFuture CAS so the old channel's close listener cannot wipe a freshly swapped-in new channel. - NettyConnectionHandler: schedule attemptGracefulMigration on connectivityExecutor with GRACEFUL_RECONNECT_DELAY_MS=200ms; one retry on failure, then fall back to AbstractNettyConnectionClient#onGoaway. - NettyConnectionClient#initBootstrap: closeFuture listener now compareAndClearNettyChannel(ch) instead of clearNettyChannel(). - AbstractNettyConnectionClient: add compareAndClearNettyChannel, plus package-private getConnectivityExecutor / getReconnectDuration. - NettyChannel#getChannelIfPresent: lookup-only API used by channelInactive. - TripleGoAwayHandler: log errorCode and lastStreamId. Signed-off-by: Jue Zhang <devjue@gmail.com>
b111c6d to
02e4fa9
Compare
|
@chickenlj PTAL |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 3.3 #16345 +/- ##
============================================
- Coverage 60.86% 60.84% -0.02%
+ Complexity 11768 11767 -1
============================================
Files 1953 1953
Lines 89260 89291 +31
Branches 13471 13472 +1
============================================
+ Hits 54329 54333 +4
- Misses 29350 29367 +17
- Partials 5581 5591 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
We would appreciate if you add testcases to https://github.com/apache/dubbo-samples for this PR. |
There was a problem hiding this comment.
@zrlw , have been going through the PR , seen the changes made , tried to run the changes locally,test the changes, and it worked,
@Test
void testGoawayCausesNoChannelFailure() throws Throwable {
final AbstractConnectionClient connectionClient =
connectionManager.connect(url, new DefaultPuHandler());
Assertions.assertTrue(
connectionClient.isAvailable(),
"Precondition: connection should be available before GOAWAY"
);
io.netty.channel.Channel channelBeforeGoaway = connectionClient.getChannel(true);
NettyConnectionHandler handler = new NettyConnectionHandler(
(AbstractNettyConnectionClient) connectionClient);
handler.onGoAway(channelBeforeGoaway);
io.netty.channel.Channel channelAfterGoaway = connectionClient.getChannel(true);
Assertions.assertNotNull(
channelAfterGoaway,
"Channel should NOT be null immediately after GOAWAY " +
"— old channel should stay alive until new one is ready"
);
await().atMost(Duration.ofSeconds(10))
.until(() -> connectionClient.isAvailable());
Assertions.assertTrue(
connectionClient.isAvailable(),
"Connection should be available after GOAWAY migration completes"
);
connectionClient.close();
}
}
Created TripleGoAwayTest.java based on these codes. |
There was a problem hiding this comment.
Pull request overview
This PR changes Dubbo Triple (HTTP/2) Consumer-side GOAWAY handling to avoid a transient “no channel/provider available” window by keeping the old connection usable while establishing a replacement connection, then swapping the active channel safely.
Changes:
- Implement delayed, connectivity-executor-based graceful reconnection on GOAWAY (instead of immediately nulling the channel reference).
- Prevent old channel
closeFuturelisteners from clearing a newly swapped-in channel via CAS-basedcompareAndClearNettyChannel. - Add diagnostics (GOAWAY errorCode/lastStreamId logging) and a new unit test for the “channel should not become null immediately after GOAWAY” behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| dubbo-rpc/dubbo-rpc-triple/src/main/java/org/apache/dubbo/rpc/protocol/tri/transport/TripleGoAwayHandler.java | Adds GOAWAY frame diagnostics and triggers connection migration on GOAWAY. |
| dubbo-remoting/dubbo-remoting-netty4/src/test/java/org/apache/dubbo/remoting/transport/netty4/TripleGoAwayTest.java | New test to validate no immediate channel-null window after GOAWAY. |
| dubbo-remoting/dubbo-remoting-netty4/src/main/java/org/apache/dubbo/remoting/transport/netty4/NettyConnectionHandler.java | Implements delayed graceful migration scheduling and avoids reconnect-on-inactive for GOAWAY-marked channels. |
| dubbo-remoting/dubbo-remoting-netty4/src/main/java/org/apache/dubbo/remoting/transport/netty4/NettyConnectionClient.java | Updates close listener to CAS-clear only the expected channel reference. |
| dubbo-remoting/dubbo-remoting-netty4/src/main/java/org/apache/dubbo/remoting/transport/netty4/NettyChannel.java | Adds cache lookup-only helper to avoid transient NettyChannel allocation on inactive channels. |
| dubbo-remoting/dubbo-remoting-netty4/src/main/java/org/apache/dubbo/remoting/transport/netty4/AbstractNettyConnectionClient.java | Adds CAS clear helper/accessors; tweaks logs; keeps old->new swap safe under concurrent close. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (msg instanceof Http2GoAwayFrame) { | ||
| Http2GoAwayFrame goAwayFrame = (Http2GoAwayFrame) msg; | ||
| final ConnectionHandler connectionHandler = |
| import org.apache.dubbo.rpc.model.ApplicationModel; | ||
| import org.apache.dubbo.rpc.model.ModuleModel; | ||
|
|
| private static URL url; | ||
|
|
||
| private static NettyPortUnificationServer server; | ||
|
|
||
| private static ConnectionManager connectionManager; |
| int port = NetUtils.getAvailablePort(); | ||
| url = URL.valueOf("tri://127.0.0.1:" + port + "?foo=bar"); | ||
| ApplicationModel applicationModel = ApplicationModel.defaultModel(); | ||
| ApplicationConfig applicationConfig = new ApplicationConfig("provider-app"); |
| public static void close() { | ||
| try { | ||
| server.close(); | ||
| } catch (Throwable e) { | ||
| // ignored | ||
| } | ||
| } |
| if (logger.isDebugEnabled()) { | ||
| logger.debug("Connection:{} connected", this); | ||
| logger.debug("Connection:{} connected, graceful migration complete", this); | ||
| } |
What is the purpose of the change
Fixes #16344.
Eliminate the single-request failure window that occurs every time a Dubbo Triple Consumer receives an HTTP/2
GOAWAY(errorCode=0, lastStreamId=MAX_INT)frame from gateways / reverse proxies / providers that gracefully rotate connections (e.g.max_requests_per_connection,idle_timeout, hot-restart drain).Previously,
AbstractNettyConnectionClient#onGoawaynulled the channel reference immediately andNettyConnectionHandler#onGoAwaythen scheduled a reconnect, leaving a brief window in whichAbstractClusterInvoker#checkInvokersthrowsRpcExceptionbefore FailoverCluster's retry loop — soretries=Ndid NOT mitigate the issue. This PR implements graceful migration: keep the old channel serving until a new channel is ready, then atomically swap.Brief changelog
NettyConnectionHandler#onGoAway: keep old channel alive; scheduleattemptGracefulMigrationonconnectivityExecutorafterGRACEFUL_RECONNECT_DELAY_MS = 200ms; one retry on failure, then fall back toAbstractNettyConnectionClient#onGoaway.NettyConnectionClient#initBootstrap:closeFuturelistener now callscompareAndClearNettyChannel(ch)instead ofclearNettyChannel()— prevents the old channel's close listener from wiping a freshly swapped-in new channel.AbstractNettyConnectionClient: addcompareAndClearNettyChannel, plus package-privategetConnectivityExecutor()/getReconnectDuration()accessors.NettyChannel#getChannelIfPresent: lookup-only API used bychannelInactiveto avoid allocating a transientNettyChannelpurely for logging.TripleGoAwayHandler: logerrorCodeandlastStreamIdso the GOAWAY trigger is observable.Verifying this change
Validated against a local Triple demo with sustained 1 QPS traffic:
No provider availableerrors and zero request failures.:45956 → :45110 → :53222 → :58366 → :56372 → :55436 → :55188 → :52926 → :50682 → :48682 → :47626 → :44014 → :43116 → :45044 → :48602 → :51880 → :50992 → :51672dubbo-remoting-netty4anddubbo-rpc-tripleunit tests.Does this pull request potentially affect one of the following parts
connectivityExecutor; does NOT introduce new threads)Backward-compatible: defaults are unchanged; the only observable difference is that GOAWAY frames no longer cause a request failure window.
Documentation
Happy to backport to
3.2once this lands on3.3.