Re: file server freezes with all nfsds stuck in D state after upgrade to Debian bookworm

Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> · Fri, 7 Apr 2023 14:26:05 +0800

Hi,

在 2023/04/07 0:59, Christian Herzog 写道:
Dear all,

disclaimer: this email was originally posted to linux-nfs since we believed
the problem to be nfsd, but Chuck Lever suggested that rq_qos_wait hinted at a
problem further down in the storage stack and referred to you guys, so here we
are:

for our researchers we are running file servers in the hundreds-of-TiB to
low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband
LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL,
we prepared an upgrade to Debian bookworm and tests went well. About a week
after one of the upgrades, we ran into the first occurence of our problem: all
of a sudden, all nfsds enter the D state and are not recoverable. However, the
underlying file systems seem fine and can be read and written to. The only way
out appears to be to reboot the server. The only clues are the frozen nfsds
and strack traces like

[<0>] rq_qos_wait+0xbc/0x130
[<0>] wbt_wait+0xa2/0x110
[<0>] __rq_qos_throttle+0x20/0x40
[<0>] blk_mq_submit_bio+0x2d3/0x580
[<0>] submit_bio_noacct_nocheck+0xf7/0x2c0
[<0>] iomap_submit_ioend+0x4b/0x80
[<0>] iomap_do_writepage+0x4b4/0x820
[<0>] write_cache_pages+0x180/0x4c0
[<0>] iomap_writepages+0x1c/0x40
[<0>] xfs_vm_writepages+0x79/0xb0 [xfs]
[<0>] do_writepages+0xbd/0x1c0
[<0>] filemap_fdatawrite_wbc+0x5f/0x80
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] file_write_and_wait_range+0x41/0x90
[<0>] xfs_file_fsync+0x5a/0x2a0 [xfs]
[<0>] nfsd_commit+0x93/0x190 [nfsd]
[<0>] nfsd4_commit+0x5e/0x90 [nfsd]
[<0>] nfsd4_proc_compound+0x352/0x660 [nfsd]
[<0>] nfsd_dispatch+0x167/0x280 [nfsd]
[<0>] svc_process_common+0x286/0x5e0 [sunrpc]
[<0>] svc_process+0xad/0x100 [sunrpc]
[<0>] nfsd+0xd5/0x190 [nfsd]
[<0>] kthread+0xe6/0x110
[<0>] ret_from_fork+0x1f/0x30

I'm not familiar with nfsd, but since above thread is waiting for
inflight request to be done, it'll be helper to monitor following
debugfs:

under /sys/kernel/debug/block/[device]/:

rqos/wbt/inflight
hctx*/tags
hctx*/sched_tags
hctx*/busy
hctx*/dispatch

This can provide a preliminary conclusion that this is due to io is too
slow or there is a bug and io is hanged.

Thanks,
Kuai

(we've also seen nfsd3). It's very sporadic, we have no idea what's triggering
it and it has now happened 4 times on one server and once on a second.
Needless to say, these are production systems, so we have a window of a few
minutes for debugging before people start yelling. We've thrown everything we
could at our test setup but so far haven't been able to trigger it.
Any pointers would be highly appreciated.

thanks and best regards,
-Christian

cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"

uname -vr
6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19)

apt list --installed '*nfs*'
libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic]
nfs-common/testing,now 1:2.6.2-4 amd64 [installed]
nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed]

nfsconf -d
[exportd]
  debug = all
[exportfs]
  debug = all
[general]
  pipefs-directory = /run/rpc_pipefs
[lockd]
  port = 32769
  udp-port = 32769
[mountd]
  debug = all
  manage-gids = True
  port = 892
[nfsd]
  debug = all
  port = 2049
  threads = 48
[nfsdcld]
  debug = all
[nfsdcltrack]
  debug = all
[sm-notify]
  debug = all
  outgoing-port = 846
[statd]
  debug = all
  outgoing-port = 2020
  port = 662