Re: Advice sought on RCU stalls on ARM64 WSL2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue Mar 5, 2024 at 5:57 AM UTC, Max Boone wrote:
> On Tue Mar 5, 2024 at 12:32 AM UTC, Joel Fernandes wrote:
> > Have you tried to reproduce the issue with CONFIG_RSEQ=n and see if it happens?
>
> Will build a new kernel today with that flag, and report back.

With CONFIG_RSEQ=n the stalls happen a lot less often, the system is
way more workable and when it does freeze up it recovered on its own
once, and I was able to get the full kernel messages for this:

```
[  675.812339] rcu: INFO: rcu_sched self-detected stall on CPU
[  675.814587] rcu:     3-....: (14893 ticks this GP) idle=762c/1/0x4000000000000000 softirq=6920/6920 fqs=6610
[  675.815606] rcu:     (t=15001 jiffies g=50497 q=1304 ncpus=8)
[  675.816520] CPU: 3 PID: 232 Comm: snapfuse Not tainted 6.7.7-WSL2-STABLE+ #2
[  675.816550] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  675.816553] pc : __arch_copy_to_user+0x1a0/0x240
[  675.817689] lr : _copy_to_iter+0xf0/0x560
[  675.818069] sp : ffff800082ceba80
[  675.818070] x29: ffff800082cebac0 x28: 0000000001b2c000 x27: 0000000000000005
[  675.818074] x26: 0000000000000000 x25: ffff00004c491000 x24: 0000000000000000
[  675.818076] x23: 0000000000001000 x22: 0000040000000000 x21: ffff800082cebd30
[  675.818079] x20: ffff800082cebd30 x19: 0000000000001000 x18: 0000000000000000
[  675.818081] x17: 0000000000000000 x16: 0000000000000000 x15: ffff00004c491000
[  675.818083] x14: 9887db4ae914c054 x13: 6bcd444ce14effe5 x12: 0b22b481c6001041
[  675.818086] x11: 7513c0250d7df247 x10: b85affa4063b12c7 x9 : 368beb85bc648557
[  675.818088] x8 : 217c88df9795370e x7 : a16d77942052b4ab x6 : 0000aaf844516fff
[  675.818090] x5 : 0000aaf844517e2f x4 : 0000000000000000 x3 : 0000000000003daf
[  675.818092] x2 : 0000000000000dc0 x1 : ffff00004c491210 x0 : 0000aaf844516e2f
[  675.818096] Call trace:
[  675.818143]  __arch_copy_to_user+0x1a0/0x240
[  675.818147]  copy_page_to_iter+0xbc/0x140
[  675.818150]  filemap_read+0x1b0/0x398
[  675.818427]  generic_file_read_iter+0x48/0x168
[  675.818429]  ext4_file_read_iter+0x58/0x288
[  675.818681]  vfs_read+0x1e8/0x280
[  675.818804]  ksys_pread64+0x90/0xf0
[  675.818806]  __arm64_sys_pread64+0x24/0x48
[  675.818807]  invoke_syscall.constprop.0+0x54/0x128
[  675.818912]  do_el0_svc+0x44/0xf0
[  675.818914]  el0_svc+0x24/0xb0
[  675.819041]  el0t_64_sync_handler+0x138/0x148
[  675.819043]  el0t_64_sync+0x14c/0x150
[  681.501178] block sda: the capability attribute has been deprecated.
[  741.700330] rcu: INFO: rcu_sched self-detected stall on CPU
[  741.701707] rcu:     4-....: (14940 ticks this GP) idle=8074/1/0x4000000000000000 softirq=13021/13037 fqs=6400
[  741.703152] rcu:     (t=15001 jiffies g=50713 q=5093 ncpus=8)
[  741.704017] CPU: 4 PID: 194 Comm: systemd-journal Not tainted 6.7.7-WSL2-STABLE+ #2
[  741.704047] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  741.704050] pc : __arch_copy_to_user+0x190/0x240
[  741.704424] lr : _copy_to_iter+0xf0/0x560
[  741.704565] sp : ffff800082cfb870
[  741.704566] x29: ffff800082cfb8b0 x28: ffff00000fe96b18 x27: 0000000000000085
[  741.704569] x26: 0000000000000000 x25: ffff0000061e1c00 x24: 0000000000000000
[  741.704571] x23: 0000000000000085 x22: ffff000117d2a600 x21: ffff800082cfbd90
[  741.704574] x20: ffff0000061e1c00 x19: 0000000000000085 x18: 0000000000000000
[  741.704608] x17: 0000000000000000 x16: 0000000000000000 x15: ffff0000061e1c00
[  741.704610] x14: 62616c6961766120 x13: 7365746164707520 x12: 6f6e207361682070
[  741.704612] x11: 616e73203a687365 x10: 7266657220746f6e x9 : 6e6163203a313937
[  741.704614] x8 : 3a6f672e73726570 x7 : 6c656865726f7473 x6 : 0000ab58c96fd6f0
[  741.704617] x5 : 0000ab58c96fd775 x4 : 0000000000000000 x3 : 0000000000000000
[  741.704619] x2 : 0000000000000005 x1 : ffff0000061e1c40 x0 : 0000ab58c96fd6f0
[  741.704621] Call trace:
[  741.704647]  __arch_copy_to_user+0x190/0x240
[  741.704651]  simple_copy_to_iter+0x48/0x98
[  741.704939]  __skb_datagram_iter+0x7c/0x280
[  741.704941]  skb_copy_datagram_iter+0x48/0xc8
[  741.704943]  unix_stream_read_actor+0x30/0x68
[  741.705137]  unix_stream_read_generic+0x304/0xb70
[  741.705139]  unix_stream_recvmsg+0xc0/0xd0
[  741.705140]  sock_recvmsg+0x88/0x108
[  741.705170]  ____sys_recvmsg+0x78/0x198
[  741.705171]  ___sys_recvmsg+0x80/0xf0
[  741.705173]  __sys_recvmsg+0x5c/0xd0
[  741.705175]  __arm64_sys_recvmsg+0x28/0x50
[  741.705177]  invoke_syscall.constprop.0+0x54/0x128
[  741.705316]  do_el0_svc+0xcc/0xf0
[  741.705317]  el0_svc+0x24/0xb0
[  741.705369]  el0t_64_sync_handler+0x138/0x148
[  741.705371]  el0t_64_sync+0x14c/0x150
[  743.232431] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 4-.... } 15347 jiffies s: 721 root: 0x10/.
[  743.234297] rcu: blocking rcu_node structures (internal RCU debug):
[  743.235477] Sending NMI from CPU 1 to CPUs 4:
[  743.235491] NMI backtrace for cpu 4
[  743.235531] CPU: 4 PID: 194 Comm: systemd-journal Not tainted 6.7.7-WSL2-STABLE+ #2
[  743.235535] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  743.235537] pc : __arch_copy_to_user+0x190/0x240
[  743.235598] lr : _copy_to_iter+0xf0/0x560
[  743.235603] sp : ffff800082cfb870
[  743.235604] x29: ffff800082cfb8b0 x28: ffff00000fe96b18 x27: 0000000000000085
[  743.235607] x26: 0000000000000000 x25: ffff0000061e1c00 x24: 0000000000000000
[  743.235610] x23: 0000000000000085 x22: ffff000117d2a600 x21: ffff800082cfbd90
[  743.235612] x20: ffff0000061e1c00 x19: 0000000000000085 x18: 0000000000000000
[  743.235614] x17: 0000000000000000 x16: 0000000000000000 x15: ffff0000061e1c00
[  743.235617] x14: 62616c6961766120 x13: 7365746164707520 x12: 6f6e207361682070
[  743.235619] x11: 616e73203a687365 x10: 7266657220746f6e x9 : 6e6163203a313937
[  743.235621] x8 : 3a6f672e73726570 x7 : 6c656865726f7473 x6 : 0000ab58c96fd6f0
[  743.235623] x5 : 0000ab58c96fd775 x4 : 0000000000000000 x3 : 0000000000000000
[  743.235626] x2 : 0000000000000005 x1 : ffff0000061e1c40 x0 : 0000ab58c96fd6f0
[  743.235628] Call trace:
[  743.235630]  __arch_copy_to_user+0x190/0x240
[  743.235632]  simple_copy_to_iter+0x48/0x98
[  743.235636]  __skb_datagram_iter+0x7c/0x280
[  743.235639]  skb_copy_datagram_iter+0x48/0xc8
[  743.235641]  unix_stream_read_actor+0x30/0x68
[  743.235644]  unix_stream_read_generic+0x304/0xb70
[  743.235646]  unix_stream_recvmsg+0xc0/0xd0
[  743.235647]  sock_recvmsg+0x88/0x108
[  743.235650]  ____sys_recvmsg+0x78/0x198
[  743.235651]  ___sys_recvmsg+0x80/0xf0
[  743.235653]  __sys_recvmsg+0x5c/0xd0
[  743.235655]  __arm64_sys_recvmsg+0x28/0x50
[  743.235657]  invoke_syscall.constprop.0+0x54/0x128
[  743.235661]  do_el0_svc+0xcc/0xf0
[  743.235663]  el0_svc+0x24/0xb0
[  743.235667]  el0t_64_sync_handler+0x138/0x148
[  743.235668]  el0t_64_sync+0x14c/0x150
```

Another time it didn't recover, and I was only able to get whatever was
printed to console:

```
[ 1559.425979] rcu:     7-....: (14977 ticks this GP) idle=d4ec/1/0x4000000000000000 softirq=18636/18636 fqs=5263
[ 1559.431367] rcu:     (t=15002 jiffies g=67965 q=36939 ncpus=8)
[ 1559.432083] rcu: rcu_sched kthread starved for 2866 jiffies! g67965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1559.433645] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1559.434604] rcu: RCU grace-period kthread stack dump:
[ 1559.435549] rcu: Stack dump where RCU GP kthread last ran:
[ 1739.451891] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1739.452511] rcu:     7-....: (59616 ticks this GP) idle=d4ec/1/0x4000000000000000 softirq=18636/18636 fqs=5263
[ 1739.453498] rcu:     (t=60008 jiffies g=67965 q=36939 ncpus=8)
[ 1739.454053] rcu: rcu_sched kthread starved for 47871 jiffies! g67965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1739.455135] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1739.456110] rcu: RCU grace-period kthread stack dump:
[ 1739.456687] rcu: Stack dump where RCU GP kthread last ran:
[ 1919.467822] rcu: INFO: rcu_sched self-detected stall on CPU
[ 1919.468776] rcu:     7-....: (104010 ticks this GP) idle=d4ec/1/0x4000000000000000 softirq=18636/18636 fqs=5263
[ 1919.470405] rcu:     (t=105012 jiffies g=67965 q=36941 ncpus=8)
[ 1919.472599] rcu: rcu_sched kthread starved for 92875 jiffies! g67965 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[ 1919.474013] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[ 1919.474977] rcu: RCU grace-period kthread stack dump:
[ 1919.475641] rcu: Stack dump where RCU GP kthread last ran:
```

Unfortunately the traces are not provided there.






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux