On Fri, 2024-10-11 at 18:18 +0000, Chuck Lever III wrote: > > > On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote: > > > On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote: > > > > On Sun, 01 Sep 2024, syzbot wrote: > > > > > syzbot has found a reproducer for the following issue on: > > > > > > > > I had a poke around using the provided disk image and kernel for > > > > exploring. > > > > > > > > I think the problem is demonstrated by this stack : > > > > > > > > [<0>] rpc_wait_bit_killable+0x1b/0x160 > > > > [<0>] __rpc_execute+0x723/0x1460 > > > > [<0>] rpc_execute+0x1ec/0x3f0 > > > > [<0>] rpc_run_task+0x562/0x6c0 > > > > [<0>] rpc_call_sync+0x197/0x2e0 > > > > [<0>] rpcb_register+0x36b/0x670 > > > > [<0>] svc_unregister+0x208/0x730 > > > > [<0>] svc_bind+0x1bb/0x1e0 > > > > [<0>] nfsd_create_serv+0x3f0/0x760 > > > > [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90 > > > > [<0>] genl_rcv_msg+0xb16/0xec0 > > > > [<0>] netlink_rcv_skb+0x1e5/0x430 > > > > > > > > No rpcbind is running on this host so that "svc_unregister" takes a > > > > long time. Maybe not forever but if a few of these get queued up all > > > > blocking some other thread, then maybe that pushed it over the limit. > > > > > > > > The fact that rpcbind is not running might not be relevant as the test > > > > messes up the network. "ping 127.0.0.1" stops working. > > > > > > > > So this bug comes down to "we try to contact rpcbind while holding a > > > > mutex and if that gets no response and no error, then we can hold the > > > > mutex for a long time". > > > > > > > > Are we surprised? Do we want to fix this? Any suggestions how? > > > > > > In the past, we've tried to address "hanging upcall" issues where > > > the kernel part of an administrative command needs a user space > > > service that isn't working or present. (eg mount needing a running > > > gssd) > > > > > > If NFSD is using the kernel RPC client for the upcall, then maybe > > > adding the RPC_TASK_SOFTCONN flag might turn the hang into an > > > immediate failure. > > > > > > IMO this should be addressed. > > > > > > > > > > I sent a patch that does the above, but now I'm wondering if we ought > > to take another approach. The listener array can be pretty long. What > > if we instead were to just drop and reacquire the mutex in the loop at > > strategic points? Then we wouldn't squat on the mutex for so long. > > > > Something like this maybe? It's ugly but it might prevent hung task > > warnings, and listener setup isn't a fastpath anyway. > > > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > index 3adbc05ebaac..5de01fb4c557 100644 > > --- a/fs/nfsd/nfsctl.c > > +++ b/fs/nfsd/nfsctl.c > > @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) > > > > set_bit(XPT_CLOSE, &xprt->xpt_flags); > > spin_unlock_bh(&serv->sv_lock); > > > > svc_xprt_close(xprt); > > + > > + /* ensure we don't squat on the mutex for too long */ > > + mutex_unlock(&nfsd_mutex); > > + mutex_lock(&nfsd_mutex); > > spin_lock_bh(&serv->sv_lock); > > } > > > > @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) > > /* always save the latest error */ > > if (ret < 0) > > err = ret; > > + > > + /* ensure we don't squat on the mutex for too long */ > > + mutex_unlock(&nfsd_mutex); > > + mutex_lock(&nfsd_mutex); > > } > > > > if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks)) > > I had a look at the rpcb upcall code a couple of weeks ago. > I'm not convinced that setting SOFTCONN in all cases will > help but unfortunately the reasons for my skepticism have > all but leaked out of my head. > > Releasing and re-acquiring the mutex is often a sign of > a deeper problem. It might look that way, but in this case we're iterating over the list in the netlink call. There's no reason we need to hold the nfsd_mutex over that entire process. Probably we can reduce the scope a bit further and make this look a little less sketchy. I'll send a revised patch. > > I think you're in the right vicinity > but I'd like to better understand the actual cause of > the delay. The listener list shouldn't be all that long, > but maybe it has a unintentional loop in it? > I don't think that's possible given that this is in a netlink request, but I'm no expert there. > I wish we had a reproducer for these issues. The old interface only allowed one listener to be set at a time. The new one allows a list, and I don't think it's bounded in any way. You could send down hundreds of listeners at once, and if they all end up hanging for a little while trying to talk to rpcbind, then that could easily cause the hung task warning. -- Jeff Layton <jlayton@xxxxxxxxxx>