On Fri, 2019-02-22 at 07:12 -0500, Dave Wysochanski wrote: > Hi Olga, > > Do you have a reproducer for this? A number of months ago I did a > significant amount of testing with half-closed connections, after we > had reports of connections stuck in FIN_WAIT2 in some older kernels. > What I found was with kernels that had the tcp keepalives (commit > 7f260e8575bf53b93b77978c1e39f8e67612759c), I could only reproduce a > hang of a few minutes, after which time the tcp keepalive code would > reset the connection. > > That said it was a while ago and something subtle may have changed. > Also I'm not not sure if your header implies an indefinite hang or > just > a few minutes. > > Thanks. > > > On Wed, 2019-02-20 at 09:56 -0500, Olga Kornievskaia wrote: > > From: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > When server replies with an ACK to client's FIN/ACK, client ends > > up stuck in a TCP_FIN_WAIT2 state and client's mount hangs. > > Instead, make sure to close and reset client's socket and transport > > when transitioned into that state. So, please do note that we do not want to ignore the FIN_WAIT2 state because it implies that the server has not closed the socket on its side. That again means that we cannot re-establish a connection using the same source IP+port to the server, which is problematic for protocols such as NFSv3 which rely on standard duplicate reply cache for correct replay semantics. This is why we don't just set the TCP_LINGER2 socket option and call sock_release(). The choice to try to wait it out is deliberate because the alternative is that we end up with busy-waiting re-connection attempts. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx