> On Oct 28, 2024, at 8:46 AM, Tom Talpey <tom@xxxxxxxxxx> wrote: > > On 10/28/2024 5:18 AM, Benoît Gschwind wrote: >> Hello, >> The issue trigger again, I attached the result of: >> # dmesg -W | tee dmesg.txt >> using: >> # echo t > /proc/sysrq-trigger >> I have the following PID stuck: >> 1474 D (disk sleep) 0:54:58.602 [nfsd] >> 1475 D (disk sleep) 0:54:58.602 [nfsd] >> 1484 D (disk sleep) 0:54:58.602 [nfsd] >> 1495 D (disk sleep) 0:54:58.602 [nfsd] > > Hmm, 1495 is stuck in nfsd4_create_session > > > [427468.304955] task:nfsd state:D stack:0 pid:1495 ppid:2 flags:0x00004000 > > [427468.304962] Call Trace: > > [427468.304965] <TASK> > > [427468.304971] __schedule+0x34d/0x9e0 > > [427468.304983] schedule+0x5a/0xd0 > > [427468.304991] schedule_timeout+0x118/0x150 > > [427468.305003] wait_for_completion+0x86/0x160 > > [427468.305015] __flush_workqueue+0x152/0x420 > > [427468.305031] nfsd4_create_session+0x79f/0xba0 [nfsd] > > [427468.305092] nfsd4_proc_compound+0x34c/0x660 [nfsd] > > [427468.305147] nfsd_dispatch+0x1a1/0x2b0 [nfsd] > > [427468.305199] svc_process_common+0x295/0x610 [sunrpc] > > [427468.305269] ? svc_recv+0x491/0x810 [sunrpc] > > [427468.305337] ? nfsd_svc+0x370/0x370 [nfsd] > > [427468.305389] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > > [427468.305437] svc_process+0xad/0x100 [sunrpc] > > [427468.305505] nfsd+0x99/0x140 [nfsd] > > [427468.305555] kthread+0xda/0x100 > > [427468.305562] ? kthread_complete_and_exit+0x20/0x20 > > [427468.305572] ret_from_fork+0x22/0x30 > > and the other three are stuck in nfsd4_destroy_session > > > [427468.298315] task:nfsd state:D stack:0 pid:1474 ppid:2 flags:0x00004000 > > [427468.298322] Call Trace: > > [427468.298326] <TASK> > > [427468.298332] __schedule+0x34d/0x9e0 > > [427468.298343] schedule+0x5a/0xd0 > > [427468.298350] schedule_timeout+0x118/0x150 > > [427468.298362] wait_for_completion+0x86/0x160 > > [427468.298375] __flush_workqueue+0x152/0x420 > > [427468.298392] nfsd4_destroy_session+0x1b6/0x250 [nfsd] > > [427468.298456] nfsd4_proc_compound+0x34c/0x660 [nfsd] > > [427468.298515] nfsd_dispatch+0x1a1/0x2b0 [nfsd] > > [427468.298568] svc_process_common+0x295/0x610 [sunrpc] > > [427468.298643] ? svc_recv+0x491/0x810 [sunrpc] > > [427468.298711] ? nfsd_svc+0x370/0x370 [nfsd] > > [427468.298776] ? nfsd_shutdown_threads+0x90/0x90 [nfsd] > > [427468.298825] svc_process+0xad/0x100 [sunrpc] > > [427468.298896] nfsd+0x99/0x140 [nfsd] > > [427468.298946] kthread+0xda/0x100 > > [427468.298954] ? kthread_complete_and_exit+0x20/0x20 > > [427468.298963] ret_from_fork+0x22/0x30 > > There aren't a lot of 6.1-era changes in either of these, but there > are some interesting behavior updates around session create replay > from early this year. I wonder if the 6.1 server is mishandling an > nfserr_jukebox situation in nfsd4_session_create. For reference: 10396f4df8b7 ("nfsd: hold a lighter-weight client reference over CB_RECALL_ANY") 99dc2ef0397d ("NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies") e4469c6cc69b ("NFSD: Fix the NFSv4.1 CREATE_SESSION operation") A network capture, once these processes are stuck, might confirm CREATE_SESSION slot replay as a culprit. > Was the client actually attempting to mount or unmount? > > Tom. > >> Thank by advance, >> Best regards >> Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit : >>> >>> >>>> On Oct 23, 2024, at 3:27 PM, Benoît Gschwind >>>> <benoit.gschwind@xxxxxxxxxxxxxxxxx> wrote: >>>> >>>> Hello, >>>> >>>> I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64 >>>> #1 >>>> SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux) >>>> >>>> In some heavy workload some nfsd goes in D state and seems to never >>>> leave this state. I did a python script to monitor how long a >>>> process >>>> stay in particular state and I use it to monitor nfsd state. I get >>>> the >>>> following result : >>>> >>>> [...] >>>> 178056 I (idle) 0:25:24.475 [nfsd] >>>> 178057 I (idle) 0:25:24.475 [nfsd] >>>> 178058 I (idle) 0:25:24.475 [nfsd] >>>> 178059 I (idle) 0:25:24.475 [nfsd] >>>> 178060 I (idle) 0:25:24.475 [nfsd] >>>> 178061 I (idle) 0:25:24.475 [nfsd] >>>> 178062 I (idle) 0:24:15.638 [nfsd] >>>> 178063 I (idle) 0:24:13.488 [nfsd] >>>> 178064 I (idle) 0:24:13.488 [nfsd] >>>> 178065 I (idle) 0:00:00.000 [nfsd] >>>> 178066 I (idle) 0:00:00.000 [nfsd] >>>> 178067 I (idle) 0:00:00.000 [nfsd] >>>> 178068 I (idle) 0:00:00.000 [nfsd] >>>> 178069 S (sleeping) 0:00:02.147 [nfsd] >>>> 178070 S (sleeping) 0:00:02.147 [nfsd] >>>> 178071 S (sleeping) 0:00:02.147 [nfsd] >>>> 178072 S (sleeping) 0:00:02.147 [nfsd] >>>> 178073 S (sleeping) 0:00:02.147 [nfsd] >>>> 178074 D (disk sleep) 1:29:25.809 [nfsd] >>>> 178075 S (sleeping) 0:00:02.147 [nfsd] >>>> 178076 S (sleeping) 0:00:02.147 [nfsd] >>>> 178077 S (sleeping) 0:00:02.147 [nfsd] >>>> 178078 S (sleeping) 0:00:02.147 [nfsd] >>>> 178079 S (sleeping) 0:00:02.147 [nfsd] >>>> 178080 D (disk sleep) 1:29:25.809 [nfsd] >>>> 178081 D (disk sleep) 1:29:25.809 [nfsd] >>>> 178082 D (disk sleep) 0:28:04.444 [nfsd] >>>> >>>> All process not shown are in idle state. Columns are the following: >>>> PID, state, state name, amoung of time the state did not changed >>>> and >>>> the process was not interrupted, and /proc/PID/status Name entry. >>>> >>>> As you can read some nfsd process are in disk sleep state since >>>> more >>>> than 1 hour, but looking at the disk activity, there is almost no >>>> I/O. >>>> >>>> I tried to restart nfs-server but I get the following error from >>>> the >>>> kernel: >>>> >>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32 >>>> when sending 20 bytes - shutting down socket >>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32 >>>> when sending 20 bytes - shutting down socket >>>> >>>> The only way to recover seems to reboot the kernel. I guess because >>>> the >>>> kernel force the reboot after a given timeout. >>>> >>>> My setup involve in order : >>>> - scsi driver >>>> - mdraid on top of scsi (raid6) >>>> - btrfs ontop of mdraid >>>> - nfsd ontop of btrfs >>>> >>>> >>>> The setup is not very fast as expected, but it seems that in some >>>> situation nfsd never leave the disk sleep state. the exports >>>> options >>>> are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The >>>> situation is not commun but it's always happen at some point. For >>>> instance in the case I report here, my server booted the 2024-10-01 >>>> and >>>> was stuck about the 2024-10-23. I did reduced by a large amount the >>>> frequency of issue by using no_wdelay (I did thought that I did >>>> solved >>>> the issue when I started to use this option). >>>> >>>> My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ? >>>> >>>> Any clue on this topic or any advice is wellcome. >>> >>> Generate stack traces for each process on the system >>> using "sudo echo t > /proc/sysrq-trigger" and then >>> examine the output in the system journal. Note the >>> stack contents for the processes that look stuck. >>> >>> -- >>> Chuck Lever >>> >>> > -- Chuck Lever