Re: nfsd stuck in D (disk sleep) state

Benoît Gschwind <benoit.gschwind@xxxxxxxxxxxxxxxxx> · Mon, 28 Oct 2024 10:18:18 +0100

Hello,

The issue trigger again, I attached the result of:

# dmesg -W | tee dmesg.txt

using:

# echo t > /proc/sysrq-trigger

I have the following PID stuck:

    1474 D (disk sleep)       0:54:58.602 [nfsd]
    1475 D (disk sleep)       0:54:58.602 [nfsd]
    1484 D (disk sleep)       0:54:58.602 [nfsd]
    1495 D (disk sleep)       0:54:58.602 [nfsd]

Thank by advance,
Best regards

Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
> 
> 
> > On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
> > <benoit.gschwind@xxxxxxxxxxxxxxxxx> wrote:
> > 
> > Hello,
> > 
> > I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
> > #1
> > SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
> > 
> > In some heavy workload some nfsd goes in D state and seems to never
> > leave this state. I did a python script to monitor how long a
> > process
> > stay in particular state and I use it to monitor nfsd state. I get
> > the
> > following result :
> > 
> > [...]
> > 178056 I (idle) 0:25:24.475 [nfsd]
> > 178057 I (idle) 0:25:24.475 [nfsd]
> > 178058 I (idle) 0:25:24.475 [nfsd]
> > 178059 I (idle) 0:25:24.475 [nfsd]
> > 178060 I (idle) 0:25:24.475 [nfsd]
> > 178061 I (idle) 0:25:24.475 [nfsd]
> > 178062 I (idle) 0:24:15.638 [nfsd]
> > 178063 I (idle) 0:24:13.488 [nfsd]
> > 178064 I (idle) 0:24:13.488 [nfsd]
> > 178065 I (idle) 0:00:00.000 [nfsd]
> > 178066 I (idle) 0:00:00.000 [nfsd]
> > 178067 I (idle) 0:00:00.000 [nfsd]
> > 178068 I (idle) 0:00:00.000 [nfsd]
> > 178069 S (sleeping) 0:00:02.147 [nfsd]
> > 178070 S (sleeping) 0:00:02.147 [nfsd]
> > 178071 S (sleeping) 0:00:02.147 [nfsd]
> > 178072 S (sleeping) 0:00:02.147 [nfsd]
> > 178073 S (sleeping) 0:00:02.147 [nfsd]
> > 178074 D (disk sleep) 1:29:25.809 [nfsd]
> > 178075 S (sleeping) 0:00:02.147 [nfsd]
> > 178076 S (sleeping) 0:00:02.147 [nfsd]
> > 178077 S (sleeping) 0:00:02.147 [nfsd]
> > 178078 S (sleeping) 0:00:02.147 [nfsd]
> > 178079 S (sleeping) 0:00:02.147 [nfsd]
> > 178080 D (disk sleep) 1:29:25.809 [nfsd]
> > 178081 D (disk sleep) 1:29:25.809 [nfsd]
> > 178082 D (disk sleep) 0:28:04.444 [nfsd]
> > 
> > All process not shown are in idle state. Columns are the following:
> > PID, state, state name, amoung of time the state did not changed
> > and
> > the process was not interrupted, and /proc/PID/status Name entry.
> > 
> > As you can read some nfsd process are in disk sleep state since
> > more
> > than 1 hour, but looking at the disk activity, there is almost no
> > I/O.
> > 
> > I tried to restart nfs-server but I get the following error from
> > the
> > kernel:
> > 
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > when sending 20 bytes - shutting down socket
> > 
> > The only way to recover seems to reboot the kernel. I guess because
> > the
> > kernel force the reboot after a given timeout.
> > 
> > My setup involve in order :
> > - scsi driver
> > - mdraid on top of scsi (raid6)
> > - btrfs ontop of mdraid
> > - nfsd ontop of btrfs
> > 
> > 
> > The setup is not very fast as expected, but it seems that in some
> > situation nfsd never leave the disk sleep state. the exports
> > options
> > are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
> > situation is not commun but it's always happen at some point. For
> > instance in the case I report here, my server booted the 2024-10-01
> > and
> > was stuck about the 2024-10-23. I did reduced by a large amount the
> > frequency of issue by using no_wdelay (I did thought that I did
> > solved
> > the issue when I started to use this option).
> > 
> > My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
> > 
> > Any clue on this topic or any advice is wellcome.
> 
> Generate stack traces for each process on the system
> using "sudo echo t > /proc/sysrq-trigger" and then
> examine the output in the system journal. Note the
> stack contents for the processes that look stuck.
> 
> --
> Chuck Lever
> 
> 

Attachment:
dmesg.txt.gz

Description: application/gzip