Hi, 在 2023/04/07 0:59, Christian Herzog 写道:
Dear all, disclaimer: this email was originally posted to linux-nfs since we believed the problem to be nfsd, but Chuck Lever suggested that rq_qos_wait hinted at a problem further down in the storage stack and referred to you guys, so here we are: for our researchers we are running file servers in the hundreds-of-TiB to low-PiB range that export via NFS and SMB. Storage is iSCSI-over-Infiniband LUNs LVM'ed into individual XFS file systems. With Ubuntu 18.04 nearing EOL, we prepared an upgrade to Debian bookworm and tests went well. About a week after one of the upgrades, we ran into the first occurence of our problem: all of a sudden, all nfsds enter the D state and are not recoverable. However, the underlying file systems seem fine and can be read and written to. The only way out appears to be to reboot the server. The only clues are the frozen nfsds and strack traces like [<0>] rq_qos_wait+0xbc/0x130 [<0>] wbt_wait+0xa2/0x110 [<0>] __rq_qos_throttle+0x20/0x40 [<0>] blk_mq_submit_bio+0x2d3/0x580 [<0>] submit_bio_noacct_nocheck+0xf7/0x2c0 [<0>] iomap_submit_ioend+0x4b/0x80 [<0>] iomap_do_writepage+0x4b4/0x820 [<0>] write_cache_pages+0x180/0x4c0 [<0>] iomap_writepages+0x1c/0x40 [<0>] xfs_vm_writepages+0x79/0xb0 [xfs] [<0>] do_writepages+0xbd/0x1c0 [<0>] filemap_fdatawrite_wbc+0x5f/0x80 [<0>] __filemap_fdatawrite_range+0x58/0x80 [<0>] file_write_and_wait_range+0x41/0x90 [<0>] xfs_file_fsync+0x5a/0x2a0 [xfs] [<0>] nfsd_commit+0x93/0x190 [nfsd] [<0>] nfsd4_commit+0x5e/0x90 [nfsd] [<0>] nfsd4_proc_compound+0x352/0x660 [nfsd] [<0>] nfsd_dispatch+0x167/0x280 [nfsd] [<0>] svc_process_common+0x286/0x5e0 [sunrpc] [<0>] svc_process+0xad/0x100 [sunrpc] [<0>] nfsd+0xd5/0x190 [nfsd] [<0>] kthread+0xe6/0x110 [<0>] ret_from_fork+0x1f/0x30
I'm not familiar with nfsd, but since above thread is waiting for inflight request to be done, it'll be helper to monitor following debugfs: under /sys/kernel/debug/block/[device]/: rqos/wbt/inflight hctx*/tags hctx*/sched_tags hctx*/busy hctx*/dispatch This can provide a preliminary conclusion that this is due to io is too slow or there is a bug and io is hanged. Thanks, Kuai
(we've also seen nfsd3). It's very sporadic, we have no idea what's triggering it and it has now happened 4 times on one server and once on a second. Needless to say, these are production systems, so we have a window of a few minutes for debugging before people start yelling. We've thrown everything we could at our test setup but so far haven't been able to trigger it. Any pointers would be highly appreciated. thanks and best regards, -Christian cat /etc/os-release PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" uname -vr 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-1 (2023-03-19) apt list --installed '*nfs*' libnfsidmap1/testing,now 1:2.6.2-4 amd64 [installed,automatic] nfs-common/testing,now 1:2.6.2-4 amd64 [installed] nfs-kernel-server/testing,now 1:2.6.2-4 amd64 [installed] nfsconf -d [exportd] debug = all [exportfs] debug = all [general] pipefs-directory = /run/rpc_pipefs [lockd] port = 32769 udp-port = 32769 [mountd] debug = all manage-gids = True port = 892 [nfsd] debug = all port = 2049 threads = 48 [nfsdcld] debug = all [nfsdcltrack] debug = all [sm-notify] debug = all outgoing-port = 846 [statd] debug = all outgoing-port = 2020 port = 662