Re: 5.3.0 Regression: rpc.nfsd v4 uninterruptible sleep for 5+ minutes w/o rpc-statd/etc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19 Sep 2019, at 9:00, James Harvey wrote:

> For a really long time (years?) if you forced NFS v4 only, you could
> mask a lot of unnecessary services.
>
> In /etc/nfs.conf, in "[nfsd] I've been able to set "vers3=n", and then
> mask the following services:
> * gssproxy
> * nfs-blkmap
> * rpc-statd
> * rpcbind (service & socket)
>
> Upgrading from 5.2.14 to 5.3.0, nfs-server.service (rpc.nfsd) has
> exactly a 5 minute delay, and sometimes longer.

A bisect ends on:
4f8943f80883 SUNRPC: Replace direct task wakeups from softirq context

That commit changed the way we pull the error from the socket, previously
we'd wake the task with whatever error is in sk_err from xs_error_report(),
but now we use SO_ERROR - but that's only after possibly running through
xs_wake_disconnect which forces a closure which can change sk_err.

So, I think xs_error_report sees ECONNREFUSED, but we wake tasks with
ENOTCONN, and the client machine spins us back around again to reconnect, we
do this until things time out.

I'll send a patch to revert to the previous behavior of waking tasks with
the error as it was in xs_error_report by copying it over to the sock_xprt
struct and waking the tasks with that value.

There's another subtle change here besides that race: SO_ERROR can return
the socket's soft error, not just what's in sk_err.  That can be fun things
like EINVAL if routing lookups fail..

Ben



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux