On 3 Jan 2024, at 14:12, Chuck Lever III wrote: >> On Jan 3, 2024, at 1:47 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >> >> This looks like it started out as the problem I've been sending patches to >> fix on 6.7, latest here: >> https://lore.kernel.org/linux-nfs/e28038fba1243f00b0dd66b7c5296a1e181645ea.1702496910.git.bcodding@xxxxxxxxxx/ >> >> .. however whenever I encounter the issue, the client reconnects the >> transport again - so I think there might be an additional problem here. > > I'm looking at the same problem as you, Ben. It doesn't seem to be > similar to what Jeff reports. > > But I'm wondering if gerry-rigging the timeouts is the right answer > for backchannel replies. The problem, fundamentally, is that when a > forechannel RPC task holds the transport lock, the backchannel's reply > transmit path thinks that means the transport connection is down and > triggers a transport disconnect. Why shouldn't backchannel replies have normal timeout values? > The use of ETIMEDOUT in call_bc_transmit_status() is... not especially > clear. Seems like it should mean that the reply couldn't be sent within (what should be) the timeout values for the client's state management transport. I'm glad you're seeing this problem too. I was worried that something was seriously different about my test setup. Ben