Hi, Our Rocky 9 NFS server running the upstream 6.11.11 kernel is starting to log the following hung task messages: INFO: task kworker/u194:11:1677933 blocked for more than 215285 seconds. Tainted: G W E 6.11.11-1.el9.esat.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u194:11 state:D stack:0 pid:1677933 tgid:1677933 ppid:2 flags:0x00004000 Workqueue: nfsd4 laundromat_main [nfsd] Call Trace: <TASK> __schedule+0x21c/0x5d0 ? preempt_count_add+0x47/0xa0 schedule+0x26/0xa0 nfsd4_shutdown_callback+0xea/0x120 [nfsd] ? __pfx_var_wake_function+0x10/0x10 __destroy_client+0x1f0/0x290 [nfsd] nfs4_process_client_reaplist+0xa1/0x110 [nfsd] nfs4_laundromat+0x126/0x7a0 [nfsd] ? _raw_spin_unlock_irqrestore+0x23/0x40 laundromat_main+0x16/0x40 [nfsd] process_one_work+0x179/0x390 worker_thread+0x239/0x340 ? __pfx_worker_thread+0x10/0x10 kthread+0xdb/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> If I read this correctly, it seems to be blocked on a callback operation during shutdown of a client connection? Is this a known issue that may be fixed in the 6.12.x kernel? Could the following commit be relevant? 8dd91e8d31febf4d9cca3ae1bb4771d33ae7ee5a nfsd: fix race between laundromat and free_stateid If I increase the hung_task_warnings sysctl and wait a few minutes, the hung task message appears again, so the issue is still present on the system. How can I debug which client is causing this issue? Is there any other information I can provide? Could this be related to the following thread: https://lore.kernel.org/linux-nfs/Z2vNQ6HXfG_LqBQc@xxxxxxxxxxx/T/#u ? I don't know if this is relevant but I've noticed that some clients have multiple entries in the /proc/fs/nfsd/clients directory, so I assume these clients are not cleaned up correctly? For example: clientid: 0x6d077c99675df2b3 address: "10.87.29.32:864" status: confirmed seconds from last renew: 0 name: "Linux NFSv4.2 betelgeuse.esat.kuleuven.be" minor version: 2 Implementation domain: "kernel.org" Implementation name: "Linux 4.18.0-553.32.1.el8_10.x86_64 #1 SMP Wed Dec 11 16:33:48 UTC 2024 x86_64" Implementation time: [0, 0] callback state: UP callback address: 10.87.29.32:0 admin-revoked states: 0 *** clientid: 0x6d0596d0675df2b3 address: "10.87.29.32:864" status: courtesy seconds from last renew: 2288446 name: "Linux NFSv4.2 betelgeuse.esat.kuleuven.be" minor version: 2 Implementation domain: "kernel.org" Implementation name: "Linux 4.18.0-553.32.1.el8_10.x86_64 #1 SMP Wed Dec 11 16:33:48 UTC 2024 x86_64" Implementation time: [0, 0] callback state: UP callback address: 10.87.29.32:0 admin-revoked states: 0 The first one has status confirmed and the second one "courtesy" with a high "seconds from last renew". The address and port matches for both client entries and the callback state is both UP. For another client, there's a different output: clientid: 0x6d078a79675df2b3 address: "10.33.130.34:864" status: unconfirmed seconds from last renew: 158910 name: "Linux NFSv4.2 bujarski.esat.kuleuven.be" minor version: 2 Implementation domain: "kernel.org" Implementation name: "Linux 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 19 12:55:03 UTC 2024 x86_64" Implementation time: [0, 0] callback state: UNKNOWN callback address: (einval) admin-revoked states: 0 *** clientid: 0x6d078a7a675df2b3 address: "10.33.130.34:864" status: confirmed seconds from last renew: 2 name: "Linux NFSv4.2 bujarski.esat.kuleuven.be" minor version: 2 Implementation domain: "kernel.org" Implementation name: "Linux 5.14.0-503.19.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 19 12:55:03 UTC 2024 x86_64" Implementation time: [0, 0] callback state: UP callback address: 10.33.130.34:0 admin-revoked states: 0 Here the first status is unconfirmed and the callback state is UNKNOWN. The clients are Rocky 8, Rocky 9 and Fedora 41 clients. Regards, Rik