Re: `rcu: INFO: rcu_sched self-detected stall on CPU` and spinning kworker `rpciod`

Salvatore Bonaccorso <carnil@xxxxxxxxxx> · Fri, 31 Jan 2025 17:17:08 +0100

Hi Chuck,

On Sat, Aug 17, 2024 at 02:52:38PM +0000, Chuck Lever III wrote:
> 
> 
> > On Aug 17, 2024, at 4:39 AM, Salvatore Bonaccorso <carnil@xxxxxxxxxx> wrote:
> > 
> > Hi,
> > 
> > On Tue, Jul 30, 2024 at 02:52:47PM +0200, Paul Menzel wrote:
> >> Dear Salvatore, dear Chuck,
> >> 
> >> 
> >> Thank you for your messages.
> >> 
> >> 
> >> Am 30.07.24 um 14:19 schrieb Salvatore Bonaccorso:
> >> 
> >>> On Sat, Jul 27, 2024 at 09:19:24PM +0000, Chuck Lever III wrote:
> >>>> 
> >>>>> On Jul 27, 2024, at 5:15 PM, Salvatore Bonaccorso <carnil@xxxxxxxxxx> wrote:
> >> 
> >>>>> On Wed, Jul 17, 2024 at 07:33:24AM +0200, Paul Menzel wrote:
> >> 
> >>>>>> Using Linux 5.15.160 on a Dell PowerEdge T440/021KCD, BIOS 2.11.2
> >>>>>> 04/22/2021, a mount from another server hung. Linux logs:
> >>>>>> 
> >>>>>> ```
> >>>>>> $ dmesg -T
> >>>>>> [Wed Jul  3 16:39:34 2024] Linux version 5.15.160.mx64.476(root@xxxxxxxxxxxxxxxxxxxxxxxxx) (gcc (GCC) 12.3.0, GNU ld (GNU Binutils) 2.41) #1 SMP Wed Jun 5 12:24:15 CEST 2024
> >>>>>> [Wed Jul  3 16:39:34 2024] Command line: root=LABEL=root ro crashkernel=64G-:256M console=ttyS0,115200n8 console=tty0 init=/bin/systemd audit=0 random.trust_cpu=on systemd.unified_cgroup_hierarchy
> >>>>>> […]
> >>>>>> [Wed Jul  3 16:39:34 2024] DMI: Dell Inc. PowerEdge T440/021KCD, BIOS 2.11.2 04/22/2021
> >>>>>> […]
> >>>>>> [Tue Jul 16 06:00:10 2024] md: md3: data-check interrupted.
> >>>>>> [Tue Jul 16 11:06:01 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ee580afa xid 6890a3d2
> >>>>>> [Tue Jul 16 11:06:01 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000d4d84570 xid 3ca4017a
> >> 
> >> […]
> >> 
> >>>>>> [Tue Jul 16 11:35:59 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000028481e8f xid b682b676
> >>>>>> [Tue Jul 16 11:35:59 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000c384ff38 xid b5d5dbf5
> >>>>>> [Tue Jul 16 11:36:40 2024] rcu: INFO: rcu_sched self-detected stall on CPU
> >>>>>> [Tue Jul 16 11:36:40 2024] rcu:  13-....: (20997 ticks this GP) idle=54f/1/0x4000000000000000 softirq=31904928/31904928 fqs=4433
> >>>>>> [Tue Jul 16 11:36:40 2024]  (t=21017 jiffies g=194958993 q=5715)
> >>>>>> [Tue Jul 16 11:36:40 2024] Task dump for CPU 13:
> >>>>>> [Tue Jul 16 11:36:40 2024] task:kworker/u34:2   state:R  running task stack:    0 pid:30413 ppid:     2 flags:0x00004008
> >>>>>> [Tue Jul 16 11:36:40 2024] Workqueue: rpciod rpc_async_schedule [sunrpc]
> >>>>>> [Tue Jul 16 11:36:40 2024] Call Trace:
> >>>>>> [Tue Jul 16 11:36:40 2024]  <IRQ>
> >>>>>> [Tue Jul 16 11:36:40 2024]  sched_show_task.cold+0xc2/0xda
> >>>>>> [Tue Jul 16 11:36:40 2024]  rcu_dump_cpu_stacks+0xa1/0xd3
> >>>>>> [Tue Jul 16 11:36:40 2024]  rcu_sched_clock_irq.cold+0xc7/0x1e7
> >>>>>> [Tue Jul 16 11:36:40 2024]  ? trigger_load_balance+0x6d/0x300
> >>>>>> [Tue Jul 16 11:36:40 2024]  ? scheduler_tick+0xda/0x260
> >>>>>> [Tue Jul 16 11:36:40 2024]  update_process_times+0xa1/0xe0
> >>>>>> [Tue Jul 16 11:36:40 2024]  tick_sched_timer+0x8e/0xa0
> >>>>>> [Tue Jul 16 11:36:40 2024]  ? tick_sched_do_timer+0x90/0x90
> >>>>>> [Tue Jul 16 11:36:40 2024]  __hrtimer_run_queues+0x139/0x2a0
> >>>>>> [Tue Jul 16 11:36:40 2024]  hrtimer_interrupt+0xf4/0x210
> >>>>>> [Tue Jul 16 11:36:40 2024]  __sysvec_apic_timer_interrupt+0x5f/0xe0
> >>>>>> [Tue Jul 16 11:36:40 2024]  sysvec_apic_timer_interrupt+0x69/0x90
> >>>>>> [Tue Jul 16 11:36:40 2024]  </IRQ>
> >>>>>> [Tue Jul 16 11:36:40 2024]  <TASK>
> >>>>>> [Tue Jul 16 11:36:40 2024]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> >>>>>> [Tue Jul 16 11:36:40 2024] RIP: 0010:read_tsc+0x3/0x20
> >>>>>> [Tue Jul 16 11:36:40 2024] Code: cc cc cc cc cc cc cc 8b 05 56 ab 72 01 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 01 f9 <66> 90 48 c1 e2 20 48 09 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00
> >>>>>> [Tue Jul 16 11:36:40 2024] RSP: 0018:ffffc900087cfe00 EFLAGS: 00000246
> >>>>>> [Tue Jul 16 11:36:40 2024] RAX: 00000000226dc8b8 RBX: 000000003f3c079e RCX: 000000000000100d
> >>>>>> [Tue Jul 16 11:36:40 2024] RDX: 00000000004535c4 RSI: 0000000000000046 RDI: ffffffff82435600
> >>>>>> [Tue Jul 16 11:36:40 2024] RBP: 0003ed08d3641da3 R08: ffffffffa012c770 R09: ffffffffa012c788
> >>>>>> [Tue Jul 16 11:36:40 2024] R10: 0000000000000003 R11: 0000000000000283 R12: 0000000000000000
> >>>>>> [Tue Jul 16 11:36:40 2024] R13: 0000000000000001 R14: ffff88909311c000 R15: ffff88909311c005
> >>>>>> [Tue Jul 16 11:36:40 2024]  ktime_get+0x38/0xa0
> >>>>>> [Tue Jul 16 11:36:40 2024]  ? rpc_sleep_on_priority+0x70/0x70 [sunrpc]
> >>>>>> [Tue Jul 16 11:36:40 2024]  rpc_exit_task+0x9a/0x100 [sunrpc]
> >>>>>> [Tue Jul 16 11:36:40 2024]  __rpc_execute+0x6e/0x410 [sunrpc]
> >>>>>> [Tue Jul 16 11:36:40 2024]  rpc_async_schedule+0x29/0x40 [sunrpc]
> >>>>>> [Tue Jul 16 11:36:40 2024]  process_one_work+0x1d7/0x3a0
> >>>>>> [Tue Jul 16 11:36:40 2024]  worker_thread+0x4a/0x3c0
> >>>>>> [Tue Jul 16 11:36:40 2024]  ? process_one_work+0x3a0/0x3a0
> >>>>>> [Tue Jul 16 11:36:40 2024]  kthread+0x115/0x140
> >>>>>> [Tue Jul 16 11:36:40 2024]  ? set_kthread_struct+0x50/0x50
> >>>>>> [Tue Jul 16 11:36:40 2024]  ret_from_fork+0x1f/0x30
> >>>>>> [Tue Jul 16 11:36:40 2024]  </TASK>
> >>>>>> [Tue Jul 16 11:37:19 2024] rcu: INFO: rcu_sched self-detected stall on CPU
> >>>>>> [Tue Jul 16 11:37:19 2024] rcu:  7-....: (21000 ticks this GP) idle=5b1/1/0x4000000000000000 softirq=29984492/29984492 fqs=5159
> >>>>>> [Tue Jul 16 11:37:19 2024]  (t=21017 jiffies g=194959001 q=2008)
> >>>>>> [Tue Jul 16 11:37:19 2024] Task dump for CPU 7:
> >>>>>> [Tue Jul 16 11:37:19 2024] task:kworker/u34:2   state:R  running task stack:    0 pid:30413 ppid:     2 flags:0x00004008
> >>>>>> [Tue Jul 16 11:37:19 2024] Workqueue: rpciod rpc_async_schedule [sunrpc]
> >>>>>> [Tue Jul 16 11:37:19 2024] Call Trace:
> >>>>>> [Tue Jul 16 11:37:19 2024]  <IRQ>
> >>>>>> [Tue Jul 16 11:37:19 2024]  sched_show_task.cold+0xc2/0xda
> >>>>>> [Tue Jul 16 11:37:19 2024]  rcu_dump_cpu_stacks+0xa1/0xd3
> >>>>>> [Tue Jul 16 11:37:19 2024]  rcu_sched_clock_irq.cold+0xc7/0x1e7
> >>>>>> [Tue Jul 16 11:37:19 2024]  ? trigger_load_balance+0x6d/0x300
> >>>>>> [Tue Jul 16 11:37:19 2024]  ? scheduler_tick+0xda/0x260
> >>>>>> [Tue Jul 16 11:37:19 2024]  update_process_times+0xa1/0xe0
> >>>>>> [Tue Jul 16 11:37:19 2024]  tick_sched_timer+0x8e/0xa0
> >>>>>> [Tue Jul 16 11:37:19 2024]  ? tick_sched_do_timer+0x90/0x90
> >>>>>> [Tue Jul 16 11:37:19 2024]  __hrtimer_run_queues+0x139/0x2a0
> >>>>>> [Tue Jul 16 11:37:19 2024]  hrtimer_interrupt+0xf4/0x210
> >>>>>> [Tue Jul 16 11:37:19 2024]  __sysvec_apic_timer_interrupt+0x5f/0xe0
> >>>>>> [Tue Jul 16 11:37:19 2024]  sysvec_apic_timer_interrupt+0x69/0x90
> >>>>>> [Tue Jul 16 11:37:19 2024]  </IRQ>
> >>>>>> [Tue Jul 16 11:37:19 2024]  <TASK>
> >>>>>> [Tue Jul 16 11:37:19 2024]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> >>>>>> [Tue Jul 16 11:37:19 2024] RIP: 0010:_raw_spin_lock+0x10/0x20
> >>>>>> [Tue Jul 16 11:37:19 2024] Code: b8 00 02 00 00 f0 0f c1 07 a9 ff 01 00 00 75 05 c3 cc cc cc cc e9 f0 05 59 ff 0f 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f b1 17 <75> 05 c3 cc cc cc cc 89 c6 e9 62 02 59 ff 66 90 0f 1f 44 00 00 fa
> >>>>>> [Tue Jul 16 11:37:19 2024] RSP: 0018:ffffc900087cfe30 EFLAGS: 00000246
> >>>>>> [Tue Jul 16 11:37:19 2024] RAX: 0000000000000000 RBX: ffff88997131a500 RCX: 0000000000000001
> >>>>>> [Tue Jul 16 11:37:19 2024] RDX: 0000000000000001 RSI: ffff88997131a500 RDI: ffffffffa012c700
> >>>>>> [Tue Jul 16 11:37:19 2024] RBP: ffffffffa012c700 R08: ffffffffa012c770 R09: ffffffffa012c788
> >>>>>> [Tue Jul 16 11:37:19 2024] R10: 0000000000000003 R11: 0000000000000283 R12: ffff88997131a530
> >>>>>> [Tue Jul 16 11:37:19 2024] R13: 0000000000000001 R14: ffff88909311c000 R15: ffff88909311c005
> >>>>>> [Tue Jul 16 11:37:19 2024]  __rpc_execute+0x95/0x410 [sunrpc]
> >>>>>> [Tue Jul 16 11:37:19 2024]  rpc_async_schedule+0x29/0x40 [sunrpc]
> >>>>>> [Tue Jul 16 11:37:19 2024]  process_one_work+0x1d7/0x3a0
> >>>>>> [Tue Jul 16 11:37:19 2024]  worker_thread+0x4a/0x3c0
> >>>>>> [Tue Jul 16 11:37:19 2024]  ? process_one_work+0x3a0/0x3a0
> >>>>>> [Tue Jul 16 11:37:19 2024]  kthread+0x115/0x140
> >>>>>> [Tue Jul 16 11:37:19 2024]  ? set_kthread_struct+0x50/0x50
> >>>>>> [Tue Jul 16 11:37:19 2024]  ret_from_fork+0x1f/0x30
> >>>>>> [Tue Jul 16 11:37:19 2024]  </TASK>
> >>>>>> [Tue Jul 16 11:37:57 2024] rcu: INFO: rcu_sched self-detected stall on CPU
> >>>>>> […]
> >>>>>> ```
> >>>>> 
> >>>>> FWIW, on one NFS server occurence we are seeing something very close
> >>>>> to the above but in the 5.10.y case for the Debian kernel after
> >>>>> updating to 5.10.218-1 to 5.10.221-1, so kernel after which had the
> >>>>> big NFS related stack backported.
> >>>>> 
> >>>>> One backtrace we were able to catch was
> >>>>> 
> >>>>> [...]
> >>>>> Jul 27 15:24:52 nfsserver kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000003d26f7ec xid b172e1c6
> >>>>> Jul 27 15:24:52 nfsserver kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000017f1552a xid a90d7751
> >>>>> Jul 27 15:24:52 nfsserver kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000006337c521 xid 8e5e58bd
> >>>>> Jul 27 15:24:54 nfsserver kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000cbf89319 xid c2da3c73
> >>>>> Jul 27 15:24:54 nfsserver kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000e2588a21 xid a01bfec6
> >>>>> Jul 27 15:24:54 nfsserver kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000005fda63ca xid c2eeeaa6
> >>>>> [...]
> >>>>> Jul 27 15:25:15 nfsserver kernel: rcu: INFO: rcu_sched self-detected stall on CPU
> >>>>> Jul 27 15:25:15 nfsserver kernel: rcu:         15-....: (5250 ticks this GP) idle=74e/1/0x4000000000000000 softirq=3160997/3161006 fqs=2233
> >>>>> Jul 27 15:25:15 nfsserver kernel:         (t=5255 jiffies g=8381377 q=106333)
> >>>>> Jul 27 15:25:15 nfsserver kernel: NMI backtrace for cpu 15
> >>>>> Jul 27 15:25:15 nfsserver kernel: CPU: 15 PID: 3725556 Comm: kworker/u42:4 Not tainted 5.10.0-31-amd64 #1 Debian 5.10.221-1
> >>>>> Jul 27 15:25:15 nfsserver kernel: Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> >>>>> Jul 27 15:25:15 nfsserver kernel: Workqueue: rpciod rpc_async_schedule [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel: Call Trace:
> >>>>> Jul 27 15:25:15 nfsserver kernel:  <IRQ>
> >>>>> Jul 27 15:25:15 nfsserver kernel:  dump_stack+0x6b/0x83
> >>>>> Jul 27 15:25:15 nfsserver kernel:  nmi_cpu_backtrace.cold+0x32/0x69
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ? lapic_can_unplug_cpu+0x80/0x80
> >>>>> Jul 27 15:25:15 nfsserver kernel:  nmi_trigger_cpumask_backtrace+0xdf/0xf0
> >>>>> Jul 27 15:25:15 nfsserver kernel:  rcu_dump_cpu_stacks+0xa5/0xd7
> >>>>> Jul 27 15:25:15 nfsserver kernel:  rcu_sched_clock_irq.cold+0x202/0x3d9
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ? trigger_load_balance+0x5a/0x220
> >>>>> Jul 27 15:25:15 nfsserver kernel:  update_process_times+0x8c/0xc0
> >>>>> Jul 27 15:25:15 nfsserver kernel:  tick_sched_handle+0x22/0x60
> >>>>> Jul 27 15:25:15 nfsserver kernel:  tick_sched_timer+0x65/0x80
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ? tick_sched_do_timer+0x90/0x90
> >>>>> Jul 27 15:25:15 nfsserver kernel:  __hrtimer_run_queues+0x127/0x280
> >>>>> Jul 27 15:25:15 nfsserver kernel:  hrtimer_interrupt+0x110/0x2c0
> >>>>> Jul 27 15:25:15 nfsserver kernel:  __sysvec_apic_timer_interrupt+0x5c/0xe0
> >>>>> Jul 27 15:25:15 nfsserver kernel:  asm_call_irq_on_stack+0xf/0x20
> >>>>> Jul 27 15:25:15 nfsserver kernel:  </IRQ>
> >>>>> Jul 27 15:25:15 nfsserver kernel:  sysvec_apic_timer_interrupt+0x72/0x80
> >>>>> Jul 27 15:25:15 nfsserver kernel:  asm_sysvec_apic_timer_interrupt+0x12/0x20
> >>>>> Jul 27 15:25:15 nfsserver kernel: RIP: 0010:mod_delayed_work_on+0x5d/0x90
> >>>>> Jul 27 15:25:15 nfsserver kernel: Code: 00 4c 89 e7 e8 34 fe ff ff 89 c3 83 f8 f5 74 e9 85 c0 78 1b 89 ef 4c 89 f1 4c 89 e2 4c 89 ee e8 f9 fc ff ff 48 8b 3c 24 57 9d <0f> 1f 44 >
> >>>>> Jul 27 15:25:15 nfsserver kernel: RSP: 0018:ffffb5efe356fd90 EFLAGS: 00000246
> >>>>> Jul 27 15:25:15 nfsserver kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003820000f
> >>>>> Jul 27 15:25:15 nfsserver kernel: RDX: 0000000038000000 RSI: 0000000000000046 RDI: 0000000000000246
> >>>>> Jul 27 15:25:15 nfsserver kernel: RBP: 0000000000002000 R08: ffffffffc0884430 R09: ffffffffc0884448
> >>>>> Jul 27 15:25:15 nfsserver kernel: R10: 0000000000000003 R11: 0000000000000003 R12: ffffffffc0884428
> >>>>> Jul 27 15:25:15 nfsserver kernel: R13: ffff8c89d0f6b800 R14: 00000000000001f4 R15: 0000000000000000
> >>>>> Jul 27 15:25:15 nfsserver kernel:  __rpc_sleep_on_priority_timeout+0x111/0x120 [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel:  rpc_delay+0x56/0x90 [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel:  rpc_exit_task+0x5a/0x140 [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ? __rpc_do_wake_up_task_on_wq+0x1e0/0x1e0 [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel:  __rpc_execute+0x6d/0x410 [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel:  rpc_async_schedule+0x29/0x40 [sunrpc]
> >>>>> Jul 27 15:25:15 nfsserver kernel:  process_one_work+0x1b3/0x350
> >>>>> Jul 27 15:25:15 nfsserver kernel:  worker_thread+0x53/0x3e0
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ? process_one_work+0x350/0x350
> >>>>> Jul 27 15:25:15 nfsserver kernel:  kthread+0x118/0x140
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ? __kthread_bind_mask+0x60/0x60
> >>>>> Jul 27 15:25:15 nfsserver kernel:  ret_from_fork+0x1f/0x30
> >>>>> [...]
> >>>>> 
> >>>>> Is there anything which could help debug this issue?
> >>>> 
> >>>> The backtrace suggests an issue in the RPC client code; the
> >>>> server's NFSv4.1 backchannel would use that to send callbacks.
> >>>> 
> >>>> Since 5.10.218 and 5.10.221 are only about a thousand commits
> >>>> apart, a bisect should be quick and narrow down the issue to
> >>>> one or two commits.
> >>> 
> >>> Yes indeed. Unfortunately was yet unable to reproduce the issue in
> >>> more syntentic way on test environment, and the affected server in
> >>> particular is a production system.
> >>> 
> >>> Paul, is your case in some way reproducible in a testing environment
> >>> so that a bisection might be give enough hints on the problem?
> >> We hit this issue once more on the same server with Linux 5.15.160, and had
> >> to hard reboot it.
> >> 
> >> Unfortunately we did not have time yet to set up a test system to find a
> >> reproducer. In our cases a lot of compute servers seem to have accessed the
> >> NFS server. A lot of the many processes were `zstd` on a first glance.
> > 
> > So we neither, due to the nature of the server (production system) and
> > unability to reproduce the issue under some more controlled way and on
> > test environment.
> > 
> > In our case users seems to cause workloads involving use of wandb.
> > 
> > What we tried is to boot the recent kernel from 5.10.y series avaiable
> > (5.10.223-1). Then the issue showed up still. Since we disabled
> > fs.leases-enable the situation seems to be more stable). While this
> > is/might not be the solution, does that gives some additional hits?
> 
> The problem is backchannel-related, and disabling delegation
> will reduce the number of backchannel operations. Your finding
> comports with our current theory, but I can't think of how it
> narrows the field of suspects.
> 
> Is the server running short on memory, perhaps? One backchannel
> operation that was added in v5.10.220 is CB_RECALL_ANY, which
> is triggered on memory exhaustion. But that should be a fairly
> harmless addition unless there is a bug in there somewhere.
> 
> If your NFS server does not have any NFS mounts, then we could
> provide instructions for enabling client-side tracing to watch
> the details of callback traffic.

The NFS server acts as well as NFS client, so tracing more
back-channel related will I guess just load the tracing more.

But we got "lucky" and we were able to trigger the issue twice in last
days, once NFSv4 delegations were enabled again and some users started
to cause more load on the specific server as well.

I did issue 

	rpcdebug -m rpc -c

before rebooting/resetting the server  which is

Jan 30 05:27:05 nfsserver kernel: 26407 2281   -512 3d1fdb92        0        0 79bc1aa5 nfs4_cbv1 CB_RECALL_ANY a:rpc_exit_task [sunrpc] q:delayq

and the first RPC related soft lookup slapt in the log/journal I was
able to gather is:

Jan 29 22:34:05 nfsserver kernel: watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [kworker/u42:3:705574]
Jan 29 22:34:05 nfsserver kernel: Modules linked in: binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree ipmi_ssif intel_rapl_msr intel_rapl_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd ast glue_helper drm_vram_helper drm_ttm_helper rapl acpi_ipmi ttm iTCO_wdt intel_cstate ipmi_si intel_pmc_bxt drm_kms_helper mei_me iTCO_vendor_support ipmi_devintf cec ioatdma intel_uncore pcspkr evdev joydev sg i2c_algo_bit watchdog mei dca ipmi_msghandler acpi_power_meter acpi_pad button fuse drm configfs nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod dm_mod hid_generic usbhid hid sd_mod t10_pi crc_t10dif crct10dif_generic xhci_pci ahci xhci_hcd libahci i40e libata
Jan 29 22:34:05 nfsserver kernel:  crct10dif_pclmul arcmsr crct10dif_common ptp crc32_pclmul usbcore crc32c_intel scsi_mod pps_core i2c_i801 lpc_ich i2c_smbus wmi usb_common
Jan 29 22:34:05 nfsserver kernel: CPU: 11 PID: 705574 Comm: kworker/u42:3 Not tainted 5.10.0-33-amd64 #1 Debian 5.10.226-1
Jan 29 22:34:05 nfsserver kernel: Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
Jan 29 22:34:05 nfsserver kernel: Workqueue: rpciod rpc_async_schedule [sunrpc]
Jan 29 22:34:05 nfsserver kernel: RIP: 0010:ktime_get+0x7b/0xa0
Jan 29 22:34:05 nfsserver kernel: Code: d1 e9 48 f7 d1 48 89 c2 48 85 c1 8b 05 ae 2c a5 02 8b 0d ac 2c a5 02 48 0f 45 d5 8b 35 7e 2c a5 02 41 39 f4 75 9e 48 0f af c2 <48> 01 f8 48 d3 e8 48 01 d8 5b 5d 41 5c c3 cc cc cc cc f3 90 eb 84
Jan 29 22:34:05 nfsserver kernel: RSP: 0018:ffffa1aca9227e00 EFLAGS: 00000202
Jan 29 22:34:05 nfsserver kernel: RAX: 0000371a545e1910 RBX: 000005fce82a4372 RCX: 0000000000000018
Jan 29 22:34:05 nfsserver kernel: RDX: 000000000078efbe RSI: 000000000031f238 RDI: 00385c1353c92824
Jan 29 22:34:05 nfsserver kernel: RBP: 0000000000000000 R08: ffffffffc081f410 R09: ffffffffc081f410
Jan 29 22:34:05 nfsserver kernel: R10: 0000000000000003 R11: 0000000000000003 R12: 000000000031f238
Jan 29 22:34:05 nfsserver kernel: R13: ffff8ed42bf34830 R14: 0000000000000001 R15: 0000000000000000
Jan 29 22:34:05 nfsserver kernel: FS:  0000000000000000(0000) GS:ffff8ee94f880000(0000) knlGS:0000000000000000
Jan 29 22:34:05 nfsserver kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 22:34:05 nfsserver kernel: CR2: 00007ffddf306080 CR3: 00000017c420a002 CR4: 00000000007706e0
Jan 29 22:34:05 nfsserver kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 29 22:34:05 nfsserver kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 29 22:34:05 nfsserver kernel: PKRU: 55555554
Jan 29 22:34:05 nfsserver kernel: Call Trace:
Jan 29 22:34:05 nfsserver kernel:  <IRQ>
Jan 29 22:34:05 nfsserver kernel:  ? watchdog_timer_fn+0x1bb/0x210
Jan 29 22:34:05 nfsserver kernel:  ? lockup_detector_update_enable+0x50/0x50
Jan 29 22:34:05 nfsserver kernel:  ? __hrtimer_run_queues+0x127/0x280
Jan 29 22:34:05 nfsserver kernel:  ? hrtimer_interrupt+0x110/0x2c0
Jan 29 22:34:05 nfsserver kernel:  ? __sysvec_apic_timer_interrupt+0x5c/0xe0
Jan 29 22:34:05 nfsserver kernel:  ? asm_call_irq_on_stack+0xf/0x20
Jan 29 22:34:05 nfsserver kernel:  </IRQ>
Jan 29 22:34:05 nfsserver kernel:  ? sysvec_apic_timer_interrupt+0x72/0x80
Jan 29 22:34:05 nfsserver kernel:  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
Jan 29 22:34:05 nfsserver kernel:  ? ktime_get+0x7b/0xa0
Jan 29 22:34:05 nfsserver kernel:  rpc_exit_task+0x96/0x140 [sunrpc]
Jan 29 22:34:05 nfsserver kernel:  ? __rpc_do_wake_up_task_on_wq+0x1e0/0x1e0 [sunrpc]
Jan 29 22:34:05 nfsserver kernel:  __rpc_execute+0x6d/0x410 [sunrpc]
Jan 29 22:34:05 nfsserver kernel:  rpc_async_schedule+0x29/0x40 [sunrpc]
Jan 29 22:34:05 nfsserver kernel:  process_one_work+0x1b3/0x350
Jan 29 22:34:05 nfsserver kernel:  worker_thread+0x53/0x3e0
Jan 29 22:34:05 nfsserver kernel:  ? process_one_work+0x350/0x350
Jan 29 22:34:05 nfsserver kernel:  kthread+0x118/0x140
Jan 29 22:34:05 nfsserver kernel:  ? __kthread_bind_mask+0x60/0x60
Jan 29 22:34:05 nfsserver kernel:  ret_from_fork+0x1f/0x30

I can try to pick on top of the kernel the change Chuck mentioned to
me offlist, which is the posting of
https://lore.kernel.org/linux-nfs/1738271066-6727-1-git-send-email-dai.ngo@xxxxxxxxxx/,
and in fact this could be interesting. If the users keep doing the
same kind of load, this might help understanding more the issue.

As we suspect that the issue is more frequently triggered after the
switch of 5.10.118 -> 5.10.221, this enforces more the above, which
says it fixes 66af25799940 ("NFSD: add courteous server support for
thread with only delegation"), which is in 5.19-rc1, but got
backported to 5.15.154 and 5.10.220 as well.

Thanks for all your help,

Regards,
Salvatore