Re: [PATCH rcu v2] 4/5] rcu-tasks: Move RCU Tasks self-tests to core_initcall()

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Tue, 4 Feb 2025 12:20:30 -0800

On Tue, Feb 04, 2025 at 05:34:09PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-02-04 03:51:48 [-0800], Paul E. McKenney wrote:
> > On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote:
> > > > The timer and hrtimer softirq processing has moved to dedicated threads
> > > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y.  This results in
> > > > timers not expiring until later in early boot, which in turn causes the
> > > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y,
> > > > which further causes the entire kernel to hang.  One fix would be to
> > > > make timers work during this time, but there are no known users of RCU
> > > > Tasks grace periods during that time, so no justification for the added
> > > > complexity.  Not yet, anyway.
> > > > 
> > > > This commit therefore moves the call to rcu_init_tasks_generic() from
> > > > kernel_init_freeable() to a core_initcall().  This works because the
> > > > timer and hrtimer kthreads are created at early_initcall() time.
> > > 
> > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.")
> > > ?
> > 
> > Quite possibly...  I freely confess that I was more focused on the fix
> > than on the bug's origin.  Would you be willing to try this commit and
> > its predecessor?
> 
> Yes. Just verified.
> Tested-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>

Boqun, could you please apply Sebastian's tags, including the Fixes
tag above?

> > > I played with it and I can reproduce the issue with !RT + threadirqs but
> > > not with RT (which implies threadirqs).
> > > Is there anything in RT that avoids the problem?
> > 
> > Not that I know of, but then again I did not try it.  To your point,
> 
> The change looks fine.
> 
> > I do need to make a -rt rcutorture scenario.  TREE03 has been intended to
> > approximate this, and it uses the following Kconfig options:
> > 
> > ------------------------------------------------------------------------
> > 
> > CONFIG_SMP=y
> > CONFIG_NR_CPUS=16
> > CONFIG_PREEMPT_NONE=n
> > CONFIG_PREEMPT_VOLUNTARY=n
> > CONFIG_PREEMPT=y
> > #CHECK#CONFIG_PREEMPT_RCU=y
> > CONFIG_HZ_PERIODIC=y
> > CONFIG_NO_HZ_IDLE=n
> > CONFIG_NO_HZ_FULL=n
> > CONFIG_RCU_TRACE=y
> > CONFIG_HOTPLUG_CPU=y
> > CONFIG_RCU_FANOUT=2
> > CONFIG_RCU_FANOUT_LEAF=2
> > CONFIG_RCU_NOCB_CPU=n
> > CONFIG_DEBUG_LOCK_ALLOC=n
> > CONFIG_RCU_BOOST=y
> > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> > CONFIG_RCU_EXPERT=y
> 
> You could enable CONFIG_PREEMPT_RT ;)
> CONFIG_PREEMPT_LAZY is probably also set a lot.
> 
> That should be it.
> 
> > ------------------------------------------------------------------------
> > 
> > And the following kernel-boot parameters:
> > 
> > ------------------------------------------------------------------------
> > 
> > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> > rcutree.gp_preinit_delay=12
> > rcutree.gp_init_delay=3
> > rcutree.gp_cleanup_delay=3
> > rcutree.kthread_prio=2
> > threadirqs
> > rcutree.use_softirq=0
> > rcutorture.preempt_duration=10
> > 
> > ------------------------------------------------------------------------
> > 
> > Some of these are for RCU's benefit, but what should I change to more
> > closely approximate a typical real-time deployment?
> 
> See above.

Which got me this diff:

------------------------------------------------------------------------

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
index 2dc31b16e506..6158f5002497 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
@@ -2,7 +2,9 @@ CONFIG_SMP=y
 CONFIG_NR_CPUS=16
 CONFIG_PREEMPT_NONE=n
 CONFIG_PREEMPT_VOLUNTARY=n
-CONFIG_PREEMPT=y
+CONFIG_PREEMPT=n
+CONFIG_PREEMPT_LAZY=y
+CONFIG_PREEMPT_RT=y
 #CHECK#CONFIG_PREEMPT_RCU=y
 CONFIG_HZ_PERIODIC=y
 CONFIG_NO_HZ_IDLE=n
@@ -15,4 +17,5 @@ CONFIG_RCU_NOCB_CPU=n
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_RCU_BOOST=y
 CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
+CONFIG_EXPERT=y
 CONFIG_RCU_EXPERT=y

------------------------------------------------------------------------

But a 10-minute run got me the splat shown below, and in addition a
shutdown-time hang.

This is caused by RCU falling behind a callback-flooding kthread that
invokes call_rcu() in a semi-tight loop.  Setting rcutree.kthread_prio=40
avoids the splat, but still gets the shutdown-time hang.  Retrying with
the default rcutree.kthread_prio=2 failed to reproduce the splat, but
it did reproduce the shutdown-time hang.

OK, maybe printk buffers are not being flushed?  A 100-millisecond sleep
at the end of of rcu_torture_cleanup() got all of rcutorture's output
flushed, but lost the subsequent shutdown-time console traffic.  The
pr_flush(HZ/10,1) seems more sensible, but this is private to printk().

I would like to log the shutdown-time console traffic because RCU can
sometimes break things on that path.

Thoughts?

							Thanx, Paul

PS:  I will do longer runs in case that splat was not a one-off.
     My concern is that I might need to adjust something more in order
     to get a reliable callback-flooding test.

------------------------------------------------------------------------

[  133.107605] ------------[ cut here ]------------
[  133.107611] rcutorture_oom_notify invoked upon OOM during forward-progress testing.
[  133.107636] WARNING: CPU: 0 PID: 236 at kernel/rcu/rcutorture.c:3117 rcutorture_oom_notify+0x48/0x2e0
[  133.107651] Modules linked in:
[  133.107662] CPU: 0 UID: 0 PID: 236 Comm: rcu_torture_fwd Not tainted 6.13.0-rc2-00108-g56d12f072c66-dirty #1747
[  133.107667] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[  133.107669] RIP: 0010:rcutorture_oom_notify+0x48/0x2e0
[  133.107673] Code: e8 3d ca c3 00 48 8b 2d 5e 8e 17 02 48 85 ed 0f 84 82 02 00 00 90 48 c7 c6 a0 fc 20 a7 48 c7 c7 40 82 68 a7 e8 f9 38 f6 ff 90 <0f> 0b 90 90 8b 35 36 10 82 01 45 31 e4 85 f6 7e 3c 49 63 dc 41 83
[  133.107675] RSP: 0018:ffffa3b10081fab0 EFLAGS: 00010286
[  133.107679] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
[  133.107681] RDX: ffff8bf11f21c948 RSI: 0000000000000001 RDI: ffff8bf11f21c940
[  133.107682] RBP: ffff8bf101ff8000 R08: ffffffffa7b42608 R09: 00000000ffffdfff
[  133.107683] R10: ffffffffa7a62620 R11: ffffffffa7b12620 R12: 0000000000000000
[  133.107685] R13: ffffa3b10081fb60 R14: 00000000ffffffff R15: 0000000000000000
[  133.107688] FS:  0000000000000000(0000) GS:ffff8bf11f200000(0000) knlGS:0000000000000000
[  133.107690] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  133.107691] CR2: ffff8bf10fe01000 CR3: 0000000001f88000 CR4: 00000000000006f0
[  133.107693] Call Trace:
[  133.107709]  <TASK>
[  133.107713]  ? __warn+0x83/0x130
[  133.107716]  ? rcutorture_oom_notify+0x48/0x2e0
[  133.107720]  ? report_bug+0x18e/0x1a0
[  133.107723]  ? handle_bug+0x54/0x90
[  133.107727]  ? exc_invalid_op+0x18/0x70
[  133.107729]  ? asm_exc_invalid_op+0x1a/0x20
[  133.107733]  ? rcutorture_oom_notify+0x48/0x2e0
[  133.107736]  notifier_call_chain+0x59/0xc0
[  133.107744]  blocking_notifier_call_chain+0x3c/0x60
[  133.107746]  out_of_memory+0x1f3/0x710
[  133.107754]  __alloc_pages_noprof+0xf27/0x10b0
[  133.107764]  ? timerqueue_del+0x2e/0x50
[  133.107767]  alloc_pages_mpol_noprof+0x43/0xf0
[  133.107774]  new_slab+0x37d/0x440
[  133.107796]  ___slab_alloc+0x480/0x9a0
[  133.107801]  ? rcu_torture_fwd_prog+0x344/0x9e0
[  133.107803]  ? preempt_schedule_thunk+0x1a/0x30
[  133.107806]  ? rcu_torture_fwd_prog+0x344/0x9e0
[  133.107808]  __kmalloc_cache_noprof+0x74/0x1f0
[  133.107811]  rcu_torture_fwd_prog+0x344/0x9e0
[  133.107814]  ? __pfx_rcu_torture_fwd_prog+0x10/0x10
[  133.107816]  ? kthread+0xd1/0x100
[  133.107818]  ? __pfx_rcu_torture_fwd_prog+0x10/0x10
[  133.107819]  kthread+0xd1/0x100
[  133.107824]  ? __pfx_kthread+0x10/0x10
[  133.107826]  ret_from_fork+0x2f/0x50
[  133.107833]  ? __pfx_kthread+0x10/0x10
[  133.107835]  ret_from_fork_asm+0x1a/0x30
[  133.107842]  </TASK>
[  133.107843] ---[ end trace 0000000000000000 ]---