> On Oct 11, 2024, at 5:08 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Sat, 12 Oct 2024, Chuck Lever III wrote: >> >> >>> On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>> >>> On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote: >>>> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote: >>>>> On Sun, 01 Sep 2024, syzbot wrote: >>>>>> syzbot has found a reproducer for the following issue on: >>>>> >>>>> I had a poke around using the provided disk image and kernel for >>>>> exploring. >>>>> >>>>> I think the problem is demonstrated by this stack : >>>>> >>>>> [<0>] rpc_wait_bit_killable+0x1b/0x160 >>>>> [<0>] __rpc_execute+0x723/0x1460 >>>>> [<0>] rpc_execute+0x1ec/0x3f0 >>>>> [<0>] rpc_run_task+0x562/0x6c0 >>>>> [<0>] rpc_call_sync+0x197/0x2e0 >>>>> [<0>] rpcb_register+0x36b/0x670 >>>>> [<0>] svc_unregister+0x208/0x730 >>>>> [<0>] svc_bind+0x1bb/0x1e0 >>>>> [<0>] nfsd_create_serv+0x3f0/0x760 >>>>> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90 >>>>> [<0>] genl_rcv_msg+0xb16/0xec0 >>>>> [<0>] netlink_rcv_skb+0x1e5/0x430 >>>>> >>>>> No rpcbind is running on this host so that "svc_unregister" takes a >>>>> long time. Maybe not forever but if a few of these get queued up all >>>>> blocking some other thread, then maybe that pushed it over the limit. >>>>> >>>>> The fact that rpcbind is not running might not be relevant as the test >>>>> messes up the network. "ping 127.0.0.1" stops working. >>>>> >>>>> So this bug comes down to "we try to contact rpcbind while holding a >>>>> mutex and if that gets no response and no error, then we can hold the >>>>> mutex for a long time". >>>>> >>>>> Are we surprised? Do we want to fix this? Any suggestions how? >>>> >>>> In the past, we've tried to address "hanging upcall" issues where >>>> the kernel part of an administrative command needs a user space >>>> service that isn't working or present. (eg mount needing a running >>>> gssd) >>>> >>>> If NFSD is using the kernel RPC client for the upcall, then maybe >>>> adding the RPC_TASK_SOFTCONN flag might turn the hang into an >>>> immediate failure. >>>> >>>> IMO this should be addressed. >>>> >>>> >>> >>> I sent a patch that does the above, but now I'm wondering if we ought >>> to take another approach. The listener array can be pretty long. What >>> if we instead were to just drop and reacquire the mutex in the loop at >>> strategic points? Then we wouldn't squat on the mutex for so long. >>> >>> Something like this maybe? It's ugly but it might prevent hung task >>> warnings, and listener setup isn't a fastpath anyway. >>> >>> >>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c >>> index 3adbc05ebaac..5de01fb4c557 100644 >>> --- a/fs/nfsd/nfsctl.c >>> +++ b/fs/nfsd/nfsctl.c >>> @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) >>> >>> set_bit(XPT_CLOSE, &xprt->xpt_flags); >>> spin_unlock_bh(&serv->sv_lock); >>> >>> svc_xprt_close(xprt); >>> + >>> + /* ensure we don't squat on the mutex for too long */ >>> + mutex_unlock(&nfsd_mutex); >>> + mutex_lock(&nfsd_mutex); >>> spin_lock_bh(&serv->sv_lock); >>> } >>> >>> @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) >>> /* always save the latest error */ >>> if (ret < 0) >>> err = ret; >>> + >>> + /* ensure we don't squat on the mutex for too long */ >>> + mutex_unlock(&nfsd_mutex); >>> + mutex_lock(&nfsd_mutex); >>> } >>> >>> if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks)) >> >> I had a look at the rpcb upcall code a couple of weeks ago. >> I'm not convinced that setting SOFTCONN in all cases will >> help but unfortunately the reasons for my skepticism have >> all but leaked out of my head. >> >> Releasing and re-acquiring the mutex is often a sign of >> a deeper problem. I think you're in the right vicinity >> but I'd like to better understand the actual cause of >> the delay. The listener list shouldn't be all that long, >> but maybe it has a unintentional loop in it? > > I think it is wrong to register with rpcbind while holding a mutex. > Registering with rpcbind doesn't need to by synchronous does it? Could > we punt that to a workqueue? > Do we need to get a failure status back somehow?? > wait_for_completion_killable() somewhere?? I think kernel RPC service start-up needs to fail immediately if rpcbind registration doesn't work. -- Chuck Lever