On Sun, 01 Sep 2024, syzbot wrote: > syzbot has found a reproducer for the following issue on: I had a poke around using the provided disk image and kernel for exploring. I think the problem is demonstrated by this stack : [<0>] rpc_wait_bit_killable+0x1b/0x160 [<0>] __rpc_execute+0x723/0x1460 [<0>] rpc_execute+0x1ec/0x3f0 [<0>] rpc_run_task+0x562/0x6c0 [<0>] rpc_call_sync+0x197/0x2e0 [<0>] rpcb_register+0x36b/0x670 [<0>] svc_unregister+0x208/0x730 [<0>] svc_bind+0x1bb/0x1e0 [<0>] nfsd_create_serv+0x3f0/0x760 [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90 [<0>] genl_rcv_msg+0xb16/0xec0 [<0>] netlink_rcv_skb+0x1e5/0x430 No rpcbind is running on this host so that "svc_unregister" takes a long time. Maybe not forever but if a few of these get queued up all blocking some other thread, then maybe that pushed it over the limit. The fact that rpcbind is not running might not be relevant as the test messes up the network. "ping 127.0.0.1" stops working. So this bug comes down to "we try to contact rpcbind while holding a mutex and if that gets no response and no error, then we can hold the mutex for a long time". Are we surprised? Do we want to fix this? Any suggestions how? NeilBrown