Re: [BUG] nfs3 client stops retrying to connect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Chuck,

On 04 Jun 22:57, Chuck Lever wrote:
> > I am 100% sure that XPRT_CONNECTING is the issue because 1) the state
> > had the flag up 2) there was absolutley no nfs network traffic between the
> > client and the server 3) I "unfroze" the mounts by clearing it manually.
> > 
> > xs_tcp_cancel_linger_timeout, I think, is guaranteed to clear the flag.
> 
> I'm speculating based on some comments in the git log, but what if
> the transport never sees TCP_CLOSE, but rather gets an error_report
> callback instead?

I don't think that could be it because xs_tcp_setup_socket() does the
connecting and is clearing the bit in all cases so at the time you would get
a TCP_CLOSE it would have been cleared a while ago.

So that's why I thought the best explanation was finding a place where
the worker task running xs_tcp_setup_socket() is cancelled and the bit
not cleared.  This is how I found xs_tcp_close()

> > Either the callback is canceled and it clears the flag or the callback
> > will do it.  I am not sure how this could leave the flag set but I am
> > not familiar with this code, so I could totally be missing something
> > obvious.
> > 
> > xs_tcp_close() is the only thing I have found which cancels the callback
> > and does not clear the flag.
> 
> How would xs_tcp_close() be invoked?

TBH I do not know.  It's the close() method of the xprt so I am assuming
there are a few places where it could be.  But I am not familiar with
the code base..

> >> It's rather academic, though. All this code was replaced in 4.0.
> > 
> > Well, it's not academic for all the users of the stable branches which
> > might have this bug in the kernel they're running :-)
> 
> I didn't mean to be glib. The point is, stable kernels are always fixed
> by backporting an existing fix from a newer kernel.

The stable kernel rules says an "equivalent" fix in the Linus' tree.  I
think that Greg would pick up this fix unless it's too complicated.

Nevertheless, it's such an annoying bug I am pretty sure the
distributions would pick it up if Greg does not.

We had to move an nfs server on friday and I got a few machines that had
the same issue again... 

Thanks for your help, I appreciate it.

Guillaume.

-- 
Guillaume Morin <guillaume@xxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux