Hey everybody, we are experiencing a similar issue to Dr. Herzog [1]. the details and our configuration is a bit different than his, but I'd assume that the same bug is behind this. It's a single server per region, with NFS is on local storage formatted as ext4. The servers don't have a swap and have plenty of free memory. 16G-20G used (without caches) out of 64G. The amount of data stored is rather small about 1G, with about 100.000 files over 17.000 directories We've upgraded the servers from Ubuntu 20.04 to Ubuntu 22.04 about a month ago and we had four outages since than.Our clients are still on Ubuntu 20.04 if this helps Our setup is rather small.But the impacts aprox. 10.000 users per region. And the packages in Ubuntu 22.04 are a bit older than the ones in Debian 12 (bookworm ) apt list --installed '*nfs*' Listing... Done libnfsidmap1/jammy-updates,now 1:2.6.1-1ubuntu1.2 amd64 [installed,automatic] nfs-common/jammy-updates,now 1:2.6.1-1ubuntu1.2 amd64 [installed] nfs-kernel-server/jammy-updates,now 1:2.6.1-1ubuntu1.2 amd64 [installed] and the kernel Linux redacted 5.19.0-1026-gcp #28~22.04.1-Ubuntu SMP Tue Jun 6 07:24:26 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Stack-trace is a different from the one provided by Dr. Herzog. But the same between servers. so only full server restart helps. And once this happened almost immediately after the server was rebooted. This is the first time that it happened that day. Jun 26 11:34:07 redacted kernel: [326687.684554] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000000dddae84 xid bb78d292 Jun 26 11:34:07 redacted kernel: [326687.684634] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000000dddae84 xid ba78d292 Jun 26 11:41:38 redacted kernel: [327139.472177] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000004df70d45 xid 3bfdc2ea Jun 26 11:44:30 redacted kernel: [327310.678539] INFO: task nfsd:848 blocked for more than 120 seconds. Jun 26 11:44:30 redacted kernel: [327310.685086] Not tainted 5.19.0-1026-gcp #28~22.04.1-Ubuntu Jun 26 11:44:30 redacted kernel: [327310.691358] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 11:44:30 redacted kernel: [327310.699442] task:nfsd state:D stack: 0 pid: 848 ppid: 2 flags:0x00004000 Jun 26 11:44:30 redacted kernel: [327310.699451] Call Trace: Jun 26 11:44:30 redacted kernel: [327310.699456] <TASK> Jun 26 11:44:30 redacted kernel: [327310.699464] __schedule+0x248/0x5d0 Jun 26 11:44:30 redacted kernel: [327310.699478] schedule+0x58/0x100 Jun 26 11:44:30 redacted kernel: [327310.699482] schedule_timeout+0x10d/0x140 Jun 26 11:44:30 redacted kernel: [327310.699489] __wait_for_common+0x99/0x1b0 Jun 26 11:44:30 redacted kernel: [327310.699493] ? usleep_range_state+0xa0/0xa0 Jun 26 11:44:30 redacted kernel: [327310.699497] wait_for_completion+0x24/0x40 Jun 26 11:44:30 redacted kernel: [327310.699501] __flush_workqueue+0x140/0x3e0 Jun 26 11:44:30 redacted kernel: [327310.699512] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699566] nfsd4_destroy_session+0x184/0x230 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699597] nfsd4_proc_compound+0x42b/0x770 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699629] nfsd_dispatch+0x174/0x270 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699670] svc_process_common+0x2a5/0x610 [sunrpc] Jun 26 11:44:30 redacted kernel: [327310.699760] ? svc_handle_xprt+0x166/0x350 [sunrpc] Jun 26 11:44:30 redacted kernel: [327310.699807] ? nfsd_svc+0x1a0/0x1a0 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699835] ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699863] svc_process+0xba/0x110 [sunrpc] Jun 26 11:44:30 redacted kernel: [327310.699906] nfsd+0xd1/0x1a0 [nfsd] Jun 26 11:44:30 redacted kernel: [327310.699932] kthread+0xce/0xf0 Jun 26 11:44:30 redacted kernel: [327310.699937] ? kthread_complete_and_exit+0x20/0x20 Jun 26 11:44:30 redacted kernel: [327310.699941] ret_from_fork+0x1f/0x30 Jun 26 11:44:30 redacted kernel: [327310.699948] </TASK> This is about 10 minutes after reboot: Jun 26 12:45:13 redacted kernel: [ 373.274344] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000016b8f89c xid b4752784 Jun 26 12:49:06 redacted kernel: [ 605.568257] INFO: task nfsd:887 blocked for more than 120 seconds. Jun 26 12:49:06 redacted kernel: [ 605.574821] Not tainted 5.19.0-1026-gcp #28~22.04.1-Ubuntu Jun 26 12:49:06 redacted kernel: [ 605.581432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 12:49:06 redacted kernel: [ 605.589914] task:nfsd state:D stack: 0 pid: 887 ppid: 2 flags:0x00004000 Jun 26 12:49:06 redacted kernel: [ 605.589926] Call Trace: Jun 26 12:49:06 redacted kernel: [ 605.589931] <TASK> Jun 26 12:49:06 redacted kernel: [ 605.589937] __schedule+0x248/0x5d0 Jun 26 12:49:06 redacted kernel: [ 605.590112] schedule+0x58/0x100 Jun 26 12:49:06 redacted kernel: [ 605.590117] schedule_timeout+0x10d/0x140 Jun 26 12:49:06 redacted kernel: [ 605.590125] __wait_for_common+0x99/0x1b0 Jun 26 12:49:06 redacted kernel: [ 605.590129] ? usleep_range_state+0xa0/0xa0 Jun 26 12:49:06 redacted kernel: [ 605.590134] wait_for_completion+0x24/0x40 Jun 26 12:49:06 redacted kernel: [ 605.590138] __flush_workqueue+0x140/0x3e0 Jun 26 12:49:06 redacted kernel: [ 605.590147] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590199] nfsd4_destroy_session+0x184/0x230 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590235] nfsd4_proc_compound+0x42b/0x770 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590270] nfsd_dispatch+0x174/0x270 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590303] svc_process_common+0x2a5/0x610 [sunrpc] Jun 26 12:49:06 redacted kernel: [ 605.590392] ? svc_handle_xprt+0x166/0x350 [sunrpc] Jun 26 12:49:06 redacted kernel: [ 605.590447] ? nfsd_svc+0x1a0/0x1a0 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590479] ? nfsd_shutdown_threads+0xa0/0xa0 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590510] svc_process+0xba/0x110 [sunrpc] Jun 26 12:49:06 redacted kernel: [ 605.590561] nfsd+0xd1/0x1a0 [nfsd] Jun 26 12:49:06 redacted kernel: [ 605.590598] kthread+0xce/0xf0 Jun 26 12:49:06 redacted kernel: [ 605.590603] ? kthread_complete_and_exit+0x20/0x20 Jun 26 12:49:06 redacted kernel: [ 605.590608] ret_from_fork+0x1f/0x30 Jun 26 12:49:06 redacted kernel: [ 605.590616] </TASK> when things started to go really bad we gt another stacktrace: Jun 26 12:53:07 redacted kernel: [ 847.232530] INFO: task kworker/u8:3:5078 blocked for more than 120 seconds. Jun 26 12:53:07 redacted kernel: [ 847.239937] Not tainted 5.19.0-1026-gcp #28~22.04.1-Ubuntu Jun 26 12:53:08 redacted kernel: [ 847.246163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 26 12:53:08 redacted kernel: [ 847.254308] task:kworker/u8:3 state:D stack: 0 pid: 5078 ppid: 2 flags:0x00004000 Jun 26 12:53:08 redacted kernel: [ 847.254328] Workqueue: nfsd4 laundromat_main [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254438] Call Trace: Jun 26 12:53:08 redacted kernel: [ 847.254443] <TASK> Jun 26 12:53:08 redacted kernel: [ 847.254449] __schedule+0x248/0x5d0 Jun 26 12:53:08 redacted kernel: [ 847.254460] ? available_idle_cpu+0x66/0xa0 Jun 26 12:53:08 redacted kernel: [ 847.254467] schedule+0x58/0x100 Jun 26 12:53:08 redacted kernel: [ 847.254472] schedule_timeout+0x10d/0x140 Jun 26 12:53:08 redacted kernel: [ 847.254478] __wait_for_common+0x99/0x1b0 Jun 26 12:53:08 redacted kernel: [ 847.254482] ? usleep_range_state+0xa0/0xa0 Jun 26 12:53:08 redacted kernel: [ 847.254485] wait_for_completion+0x24/0x40 Jun 26 12:53:08 redacted kernel: [ 847.254489] __flush_workqueue+0x140/0x3e0 Jun 26 12:53:08 redacted kernel: [ 847.254494] nfsd4_shutdown_callback+0x4d/0x130 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254525] ? nfsd4_return_all_client_layouts+0xc9/0x160 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254556] ? nfsd4_shutdown_copy+0x70/0xb0 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254600] __destroy_client+0x1a1/0x210 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254630] expire_client+0x57/0x70 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254659] nfs4_laundromat+0x26e/0x900 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254690] ? finish_task_switch.isra.0+0x82/0x290 Jun 26 12:53:08 redacted kernel: [ 847.254697] laundromat_main+0x19/0x50 [nfsd] Jun 26 12:53:08 redacted kernel: [ 847.254726] process_one_work+0x222/0x3c0 Jun 26 12:53:08 redacted kernel: [ 847.254731] worker_thread+0x50/0x3f0 Jun 26 12:53:08 redacted kernel: [ 847.254734] ? process_one_work+0x3c0/0x3c0 Jun 26 12:53:08 redacted kernel: [ 847.254736] kthread+0xce/0xf0 Jun 26 12:53:08 redacted kernel: [ 847.254742] ? kthread_complete_and_exit+0x20/0x20 Jun 26 12:53:08 redacted kernel: [ 847.254746] ret_from_fork+0x1f/0x30 Jun 26 12:53:08 redacted kernel: [ 847.254754] </TASK> 1 - https://marc.info/?l=linux-nfs&m=168258759703478&w=2 Regards, Miha