On Fri, Feb 22, 2019 at 8:45 AM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Fri, 2019-02-22 at 07:12 -0500, Dave Wysochanski wrote: > > Hi Olga, > > > > Do you have a reproducer for this? A number of months ago I did a > > significant amount of testing with half-closed connections, after we > > had reports of connections stuck in FIN_WAIT2 in some older kernels. > > What I found was with kernels that had the tcp keepalives (commit > > 7f260e8575bf53b93b77978c1e39f8e67612759c), I could only reproduce a > > hang of a few minutes, after which time the tcp keepalive code would > > reset the connection. > > > > That said it was a while ago and something subtle may have changed. > > Also I'm not not sure if your header implies an indefinite hang or > > just > > a few minutes. > > > > Thanks. > > > > > > On Wed, 2019-02-20 at 09:56 -0500, Olga Kornievskaia wrote: > > > From: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > > > When server replies with an ACK to client's FIN/ACK, client ends > > > up stuck in a TCP_FIN_WAIT2 state and client's mount hangs. > > > Instead, make sure to close and reset client's socket and transport > > > when transitioned into that state. > Hi Trond, > So, please do note that we do not want to ignore the FIN_WAIT2 state But we do ignore the FIN_WAIT2 state. > because it implies that the server has not closed the socket on its > side. That's correct. > That again means that we cannot re-establish a connection using > the same source IP+port to the server, which is problematic for > protocols such as NFSv3 which rely on standard duplicate reply cache > for correct replay semantics. that's exactly what's happening that a client is unable to establish a new connection to the server. With the patch, the client does an RST and it re-uses the port and all is well for NFSv3. > This is why we don't just set the TCP_LINGER2 socket option and call > sock_release(). The choice to try to wait it out is deliberate because > the alternative is that we end up with busy-waiting re-connection > attempts. Why would it busy-wait? In my testing, RST happens and new connection is established? > > Cheers > Trond > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >