Re: nfsd stuck in D (disk sleep) state

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 29 Oct 2024 09:49:59 -0400

On Mon, 2024-10-28 at 13:27 -0400, Jeff Layton wrote:
> On Mon, 2024-10-28 at 08:46 -0400, Tom Talpey wrote:
> > On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
> > > Hello,
> > > 
> > > The issue trigger again, I attached the result of:
> > > 
> > > # dmesg -W | tee dmesg.txt
> > > 
> > > using:
> > > 
> > > # echo t > /proc/sysrq-trigger
> > > 
> > > I have the following PID stuck:
> > > 
> > >      1474 D (disk sleep)       0:54:58.602 [nfsd]
> > >      1475 D (disk sleep)       0:54:58.602 [nfsd]
> > >      1484 D (disk sleep)       0:54:58.602 [nfsd]
> > >      1495 D (disk sleep)       0:54:58.602 [nfsd]
> > 
> > Hmm, 1495 is stuck in nfsd4_create_session
> > 
> >  > [427468.304955] task:nfsd            state:D stack:0     pid:1495 
> > ppid:2      flags:0x00004000
> >  > [427468.304962] Call Trace:
> >  > [427468.304965]  <TASK>
> >  > [427468.304971]  __schedule+0x34d/0x9e0
> >  > [427468.304983]  schedule+0x5a/0xd0
> >  > [427468.304991]  schedule_timeout+0x118/0x150
> >  > [427468.305003]  wait_for_completion+0x86/0x160
> >  > [427468.305015]  __flush_workqueue+0x152/0x420
> >  > [427468.305031]  nfsd4_create_session+0x79f/0xba0 [nfsd]
> >  > [427468.305092]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
> >  > [427468.305147]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> >  > [427468.305199]  svc_process_common+0x295/0x610 [sunrpc]
> >  > [427468.305269]  ? svc_recv+0x491/0x810 [sunrpc]
> >  > [427468.305337]  ? nfsd_svc+0x370/0x370 [nfsd]
> >  > [427468.305389]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> >  > [427468.305437]  svc_process+0xad/0x100 [sunrpc]
> >  > [427468.305505]  nfsd+0x99/0x140 [nfsd]
> >  > [427468.305555]  kthread+0xda/0x100
> >  > [427468.305562]  ? kthread_complete_and_exit+0x20/0x20
> >  > [427468.305572]  ret_from_fork+0x22/0x30
> > 
> > and the other three are stuck in nfsd4_destroy_session
> > 
> 
> All 4 processes are stuck waiting on a flush_workqueue call. It's
> probably one of these:
> 
>     flush_workqueue(clp->cl_callback_wq);
> 
> So the question here is really -- why are the callback workqueue jobs
> stuck? I do see this:
> 
> [427468.316839] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> [427468.316899] Call Trace:
> [427468.316902]  <TASK>
> [427468.316908]  __schedule+0x34d/0x9e0
> [427468.316919]  schedule+0x5a/0xd0
> [427468.316927]  schedule_timeout+0x94/0x150
> [427468.316937]  ? __bpf_trace_tick_stop+0x10/0x10
> [427468.316947]  rpc_shutdown_client+0xf2/0x150 [sunrpc]
> [427468.317015]  ? cpuusage_read+0x10/0x10
> [427468.317026]  nfsd4_process_cb_update+0x4c/0x270 [nfsd]
> [427468.317097]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
> [427468.317146]  process_one_work+0x1c7/0x380
> [427468.317158]  worker_thread+0x4d/0x380
> [427468.317170]  ? rescuer_thread+0x3a0/0x3a0
> [427468.317177]  kthread+0xda/0x100
> [427468.317185]  ? kthread_complete_and_exit+0x20/0x20
> [427468.317195]  ret_from_fork+0x22/0x30
> [427468.317213]  </TASK>
> 
> Maybe the RPC client is having trouble clearing clnt->cl_tasks ? 
> 

Interestingly, I hit this today while testing the patches that widen
the backchannel:

[  484.550350] INFO: task nfsd:1168 blocked for more than 120 seconds.
[  484.551746]       Not tainted 6.12.0-rc5+ #188
[  484.552854] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.586316] task:nfsd            state:D stack:0     pid:1168  tgid:1168  ppid:2      flags:0x00004000
[  484.588417] Call Trace:
[  484.589070]  <TASK>
[  484.589624]  __schedule+0x4e4/0xb90
[  484.622306]  schedule+0x26/0xf0
[  484.623162]  schedule_timeout+0x121/0x150
[  484.624344]  ? __smp_call_single_queue+0xa7/0x110
[  484.625375]  ? x2apic_send_IPI+0x45/0x50
[  484.654299]  ? ttwu_queue_wakelist+0xbf/0x110
[  484.655454]  __wait_for_common+0x91/0x1c0
[  484.656372]  ? __pfx_schedule_timeout+0x10/0x10
[  484.657429]  __flush_workqueue+0x13a/0x400
[  484.686304]  nfsd4_create_session+0x9cd/0xd90 [nfsd]
[  484.687562]  nfsd4_proc_compound+0x396/0x750 [nfsd]
[  484.688746]  nfsd_dispatch+0xc6/0x210 [nfsd]
[  484.689835]  svc_process_common+0x4e6/0x6d0 [sunrpc]
[  484.722305]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[  484.723532]  svc_process+0x12d/0x1c0 [sunrpc]
[  484.724619]  svc_recv+0x7cf/0x980 [sunrpc]
[  484.725618]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[  484.758308]  nfsd+0x9f/0x100 [nfsd]
[  484.759184]  kthread+0xdd/0x110
[  484.759905]  ? __pfx_kthread+0x10/0x10
[  484.760870]  ret_from_fork+0x30/0x50
[  484.761603]  ? __pfx_kthread+0x10/0x10
[  484.790302]  ret_from_fork_asm+0x1a/0x30
[  484.791151]  </TASK>

The host was hung after this, so I can't fully confirm the scenario
below, but I think it's probably correct:

The client might want to create a new session. When that happens, the
create_session call tries to flush_workqueue() on the client-wide
backchannel workqueue.

That workqueue is stuck, as it's waiting for all of the callback RPCs
to die off (as evidenced by the stack from Benoît's info that shows it
hung in rpc_shutdown_client). So the question I guess at this point is
why these RPC tasks aren't dying like they're supposed to when
rpc_killall_tasks() is run.

I'm guessing that the backchannel-widening patches may make this a bit
easier to reproduce? In any case, this problem doesn't seem to be
confined to v6.1. 

I'm still looking at this, so stay tuned...
-- 
Jeff Layton <jlayton@xxxxxxxxxx>