> On Jan 3, 2024, at 3:16 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > On 3 Jan 2024, at 14:12, Chuck Lever III wrote: > >>> On Jan 3, 2024, at 1:47 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>> >>> This looks like it started out as the problem I've been sending patches to >>> fix on 6.7, latest here: >>> https://lore.kernel.org/linux-nfs/e28038fba1243f00b0dd66b7c5296a1e181645ea.1702496910.git.bcodding@xxxxxxxxxx/ >>> >>> .. however whenever I encounter the issue, the client reconnects the >>> transport again - so I think there might be an additional problem here. >> >> I'm looking at the same problem as you, Ben. It doesn't seem to be >> similar to what Jeff reports. >> >> But I'm wondering if gerry-rigging the timeouts is the right answer >> for backchannel replies. The problem, fundamentally, is that when a >> forechannel RPC task holds the transport lock, the backchannel's reply >> transmit path thinks that means the transport connection is down and >> triggers a transport disconnect. > > Why shouldn't backchannel replies have normal timeout values? RPC Replies are "send and forget". The server forechannel sends its Replies without a timeout. There is no such thing as a retransmitted RPC Reply (though a reliable transport might retransmit portions of it, the RPC server itself is not aware of that). And I don't see anything in the client's backchannel path that makes me think there's a different protocol-level requirement in the backchannel. >> The use of ETIMEDOUT in call_bc_transmit_status() is... not especially >> clear. > > Seems like it should mean that the reply couldn't be sent within (what > should be) the timeout values for the client's state management transport. It looks to me like backchannel replies are HARD, not SOFT. That suggests call_bc_transmit_status() is not expecting ETIMEDOUT to mean "request timed out", which is the usual meaning of ETIMEDOUT -- and that's only for SOFT RPCs. See also 762e4e67b356 ("SUNRPC: Refactor RPC call encoding"), which added RPC_TASK_NO_RETRANS_TIMEOUT to rpc_run_bc_task(). So my "read" of the code is that there currently is no timeout set, and ETIMEDOUT isn't expected to mean the send timed out. This code does not rely on a timeout to sent replies at all. It seems to be reusing the ETIMEDOUT error that was set as a side-effect of something else. > I'm glad you're seeing this problem too. I was worried that something was > seriously different about my test setup. For the public record... Testing with NFSv4.2, I see the client doing async COPY operations, but then not replying to the server's CB_OFFLOAD notifications. The client is not actually dropping the CB_OFFLOADs, though; it processes them but then never sends the reply because xprt_prepare_transmit() fails to lock the transport when sending the reply. The sleep_on_timeout in xprt_reserve_xprt is falling through because, as you observed, the rqst timeout settings are all zero. So the client has processed the CB_OFFLOAD and advanced the slot sequence number, but because the server doesn't see a reply it leaves its copy of the slot sequence number alone. The server sends the next CB_OFFLOAD and the client replies with NFS4ERR_RETRY_UNCACHED_REP because the slot sequence numbers no longer match. The client tosses that CB_OFFLOAD notification. The server doesn't retransmit it after the new session is created because DESTROY_SESSION kills the backchannel rpc_clnt and all its tasks. Or maybe there's some other bug in there, but I think the server should retransmit and it doesn't. So the client copy operation is stuck waiting for a CB_OFFLOAD notification that never arrives. The client could send a COPY_NOTIFY, to check on the pending COPY, but it doesn't. Certainly the client expects a retransmit. See 3832591e6fa5 ("SUNRPC: Handle connection issues correctly on the back channel"). But I'm not sure NFSD ever did this -- I will look into addressing that. But also, we don't want the client spuriously disconnecting and recycling its sessions just because the write lock is held by another task right at that moment. That's an enormous bubble in the throughput pipeline. The problem seems to be worse when the forechannel is under load. I think transmitting backchannel replies needs to be more reliable. -- Chuck Lever