Chuck, On 04 Jun 22:57, Chuck Lever wrote: > > I am 100% sure that XPRT_CONNECTING is the issue because 1) the state > > had the flag up 2) there was absolutley no nfs network traffic between the > > client and the server 3) I "unfroze" the mounts by clearing it manually. > > > > xs_tcp_cancel_linger_timeout, I think, is guaranteed to clear the flag. > > I'm speculating based on some comments in the git log, but what if > the transport never sees TCP_CLOSE, but rather gets an error_report > callback instead? I don't think that could be it because xs_tcp_setup_socket() does the connecting and is clearing the bit in all cases so at the time you would get a TCP_CLOSE it would have been cleared a while ago. So that's why I thought the best explanation was finding a place where the worker task running xs_tcp_setup_socket() is cancelled and the bit not cleared. This is how I found xs_tcp_close() > > Either the callback is canceled and it clears the flag or the callback > > will do it. I am not sure how this could leave the flag set but I am > > not familiar with this code, so I could totally be missing something > > obvious. > > > > xs_tcp_close() is the only thing I have found which cancels the callback > > and does not clear the flag. > > How would xs_tcp_close() be invoked? TBH I do not know. It's the close() method of the xprt so I am assuming there are a few places where it could be. But I am not familiar with the code base.. > >> It's rather academic, though. All this code was replaced in 4.0. > > > > Well, it's not academic for all the users of the stable branches which > > might have this bug in the kernel they're running :-) > > I didn't mean to be glib. The point is, stable kernels are always fixed > by backporting an existing fix from a newer kernel. The stable kernel rules says an "equivalent" fix in the Linus' tree. I think that Greg would pick up this fix unless it's too complicated. Nevertheless, it's such an annoying bug I am pretty sure the distributions would pick it up if Greg does not. We had to move an nfs server on friday and I got a few machines that had the same issue again... Thanks for your help, I appreciate it. Guillaume. -- Guillaume Morin <guillaume@xxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html