On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote: > On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote: > > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > > > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED > > > bit > > > was cleared unconditionally inside xprt_disconnect_done, similar > > > to the handling of TCP_CLOSE_WAIT. After the change the clearing > > > of XPRT_CONNECTED depends on successfully queueing a work based > > > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > > > This is significant in the case of an unexpected RST from the > > > server, as the client will only see xs_tcp_state_change called > > > with > > > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > > > with handling TCP_CLOSE_WAIT. > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@xxxxxxxxxx> > > > --- > > > net/sunrpc/xprtsock.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > index 8a5e823e0b33..b9789036051d 100644 > > > --- a/net/sunrpc/xprtsock.c > > > +++ b/net/sunrpc/xprtsock.c > > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock > > > *sk) > > > if (sk->sk_err) > > > xprt_wake_pending_tasks(xprt, -sk- > > > > sk_err); > > > /* Trigger the socket release */ > > > + clear_bit(XPRT_CONNECTED, &xprt->state); > > > xs_tcp_force_close(xprt); > > > } > > > out: > > > > Hi Dave, > > > > This isn't needed for 4.20 or newer because call_transmit() will > > now > > always call xprt_end_transmit(). I suggest that a stable fix do > > something similar (perhaps conditional on the error returned by > > xprt_transmit()?). > > > > Can you explain the handling in xs_tcp_state_change - why > XPRT_CONNECTED would need to remain set longer in the case of > handling > TCP_CLOSE case rather than TCP_CLOSE_WAIT? It seems like if we get > an > RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED > bit > getting cleared need to be delayed in that case? > > I will look into the approach you mention though at the moment I do > not > see how it helps because the underlying issue seems to be clearing of > the XPRT_CONNECTED bit. > See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically trigger autoclose if XPRT_CLOSE_WAIT is set, regardless of whether or not there are other tasks waiting for the lock. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx