Hi Dave, A re-producer is a server that sends an ACK to the client's FIN/ACK request and does nothing afterwards (I can reproduce it 100% with a hacked up server. It was discovered with a "broken" server that doesn't fully closes a connection). It leave this client unable to connect to this server again indefinitely/forever/reboot required kind of state. Once it was considered that doing something like that to the client is a form of an attack (denial-of-server) and thus the kernel has a tcp_fin_timeout option after which the kernel will abort the connection. However this only applies to the sockets that have been closed by the client. This is NOT the case. NFS does not close the connection and it ignores kernel's notification of FIN_WAIT2 state. One can argue that this is a broken server and we shouldn't bother. But this patch is an attempt to argue that the client still should care and deal with this condition. However, if the community feels that a broken server is a broken server and this form of an attack is not interested, this patch can live will be an archive for later or never. On Fri, Feb 22, 2019 at 7:12 AM Dave Wysochanski <dwysocha@xxxxxxxxxx> wrote: > > Hi Olga, > > Do you have a reproducer for this? A number of months ago I did a > significant amount of testing with half-closed connections, after we > had reports of connections stuck in FIN_WAIT2 in some older kernels. > What I found was with kernels that had the tcp keepalives (commit > 7f260e8575bf53b93b77978c1e39f8e67612759c), I could only reproduce a > hang of a few minutes, after which time the tcp keepalive code would > reset the connection. > > That said it was a while ago and something subtle may have changed. > Also I'm not not sure if your header implies an indefinite hang or just > a few minutes. > > Thanks. > > > On Wed, 2019-02-20 at 09:56 -0500, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > When server replies with an ACK to client's FIN/ACK, client ends > > up stuck in a TCP_FIN_WAIT2 state and client's mount hangs. > > Instead, make sure to close and reset client's socket and transport > > when transitioned into that state. > > > > Signed-off-by: Olga Kornievskaia <kolga@xxxxxxxxxx> > > --- > > net/sunrpc/xprtsock.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index 618e9c2..812e5e3 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -1502,6 +1502,7 @@ static void xs_tcp_state_change(struct sock > > *sk) > > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > > smp_mb__after_atomic(); > > break; > > + case TCP_FIN_WAIT2: > > case TCP_CLOSE_WAIT: > > /* The server initiated a shutdown of the socket */ > > xprt->connect_cookie++; > > @@ -2152,6 +2153,7 @@ static void xs_tcp_shutdown(struct rpc_xprt > > *xprt) > > kernel_sock_shutdown(sock, SHUT_RDWR); > > trace_rpc_socket_shutdown(xprt, sock); > > break; > > + case TCP_FIN_WAIT2: > > case TCP_CLOSE: > > case TCP_TIME_WAIT: > > xs_reset_transport(transport);