On Fri, 2019-02-22 at 09:46 -0500, Olga Kornievskaia wrote: > On Fri, Feb 22, 2019 at 8:45 AM Trond Myklebust < > trondmy@xxxxxxxxxxxxxxx> wrote: > > On Fri, 2019-02-22 at 07:12 -0500, Dave Wysochanski wrote: > > > Hi Olga, > > > > > > Do you have a reproducer for this? A number of months ago I did > > > a > > > significant amount of testing with half-closed connections, after > > > we > > > had reports of connections stuck in FIN_WAIT2 in some older > > > kernels. > > > What I found was with kernels that had the tcp keepalives (commit > > > 7f260e8575bf53b93b77978c1e39f8e67612759c), I could only reproduce > > > a > > > hang of a few minutes, after which time the tcp keepalive code > > > would > > > reset the connection. > > > > > > That said it was a while ago and something subtle may have > > > changed. > > > Also I'm not not sure if your header implies an indefinite hang > > > or > > > just > > > a few minutes. > > > > > > Thanks. > > > > > > > > > On Wed, 2019-02-20 at 09:56 -0500, Olga Kornievskaia wrote: > > > > From: Olga Kornievskaia <kolga@xxxxxxxxxx> > > > > > > > > When server replies with an ACK to client's FIN/ACK, client > > > > ends > > > > up stuck in a TCP_FIN_WAIT2 state and client's mount hangs. > > > > Instead, make sure to close and reset client's socket and > > > > transport > > > > when transitioned into that state. > > Hi Trond, > > > So, please do note that we do not want to ignore the FIN_WAIT2 > > state > > But we do ignore the FIN_WAIT2 state. We do not. We wait for the server to send a FIN, which is precisely the reason for which FIN_WAIT2 exists. > > > because it implies that the server has not closed the socket on its > > side. > > That's correct. > > > That again means that we cannot re-establish a connection using > > the same source IP+port to the server, which is problematic for > > protocols such as NFSv3 which rely on standard duplicate reply > > cache > > for correct replay semantics. > > that's exactly what's happening that a client is unable to establish > a > new connection to the server. With the patch, the client does an RST > and it re-uses the port and all is well for NFSv3. RST is not guaranteed to be delivered to the recipient. That's why the TCP protocol defines FIN: it is a guaranteed to be delivered because it is ACKed. > > This is why we don't just set the TCP_LINGER2 socket option and > > call > > sock_release(). The choice to try to wait it out is deliberate > > because > > the alternative is that we end up with busy-waiting re-connection > > attempts. > > Why would it busy-wait? In my testing, RST happens and new connection > is established? Only if the server has dropped the connection without notifying the client. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx