Hi, On Mon, Jan 13, 2025 at 11:12 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On 1/12/25 7:42 AM, Rik Theys wrote: > > On Fri, Jan 10, 2025 at 11:07 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> > >> On 1/10/25 3:51 PM, Rik Theys wrote: > >>> Are there any debugging commands we can run once the issue happens > >>> that can help to determine the cause of this issue? > >> > >> Once the issue happens, the precipitating bug has already done its > >> damage, so at that point it is too late. > > I've studied the code and bug reports a bit. I see one intriguing > mention in comment #5: > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562#5 > > /proc/130/stack: > [<0>] rpc_shutdown_client+0xf2/0x150 [sunrpc] > [<0>] nfsd4_process_cb_update+0x4c/0x270 [nfsd] > [<0>] nfsd4_run_cb_work+0x9f/0x150 [nfsd] > [<0>] process_one_work+0x1c7/0x380 > [<0>] worker_thread+0x4d/0x380 > [<0>] kthread+0xda/0x100 > [<0>] ret_from_fork+0x22/0x30 > > This tells me that the active item on the callback_wq is waiting for the > backchannel RPC client to shut down. This is probably the proximal cause > of the callback workqueue stall. > > rpc_shutdown_client() is waiting for the client's cl_tasks to become > empty. Typically this is a short wait. But here, there's one or more RPC > requests that are not completing. > > Please issue these two commands on your server once it gets into the > hung state: > > # rpcdebug -m rpc -c > # echo t > /proc/sysrq-trigger There were no rpcdebug entries configured, so I don't think the first command did much. You can find the output from the second command in attach. Regards, Rik > > Then gift-wrap the server's system journal and send it to me. I need to > see only the output from these two commands, so if you want to > anonymize the journal and truncate it to just the day of the failure, > I think that should be fine. > > > -- > Chuck Lever -- Rik
Attachment:
journal.txt.gz
Description: application/gzip