Re: [PATCH rcu v2] 4/5] rcu-tasks: Move RCU Tasks self-tests to core_initcall()

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Wed, 5 Feb 2025 06:50:53 -0800

On Tue, Feb 04, 2025 at 12:20:30PM -0800, Paul E. McKenney wrote:
> On Tue, Feb 04, 2025 at 05:34:09PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-02-04 03:51:48 [-0800], Paul E. McKenney wrote:
> > > On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote:
> > > > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote:
> > > > > The timer and hrtimer softirq processing has moved to dedicated threads
> > > > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y.  This results in
> > > > > timers not expiring until later in early boot, which in turn causes the
> > > > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y,
> > > > > which further causes the entire kernel to hang.  One fix would be to
> > > > > make timers work during this time, but there are no known users of RCU
> > > > > Tasks grace periods during that time, so no justification for the added
> > > > > complexity.  Not yet, anyway.
> > > > > 
> > > > > This commit therefore moves the call to rcu_init_tasks_generic() from
> > > > > kernel_init_freeable() to a core_initcall().  This works because the
> > > > > timer and hrtimer kthreads are created at early_initcall() time.
> > > > 
> > > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.")
> > > > ?
> > > 
> > > Quite possibly...  I freely confess that I was more focused on the fix
> > > than on the bug's origin.  Would you be willing to try this commit and
> > > its predecessor?
> > 
> > Yes. Just verified.
> > Tested-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> 
> Boqun, could you please apply Sebastian's tags, including the Fixes
> tag above?
> 
> > > > I played with it and I can reproduce the issue with !RT + threadirqs but
> > > > not with RT (which implies threadirqs).
> > > > Is there anything in RT that avoids the problem?
> > > 
> > > Not that I know of, but then again I did not try it.  To your point,
> > 
> > The change looks fine.
> > 
> > > I do need to make a -rt rcutorture scenario.  TREE03 has been intended to
> > > approximate this, and it uses the following Kconfig options:
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > CONFIG_SMP=y
> > > CONFIG_NR_CPUS=16
> > > CONFIG_PREEMPT_NONE=n
> > > CONFIG_PREEMPT_VOLUNTARY=n
> > > CONFIG_PREEMPT=y
> > > #CHECK#CONFIG_PREEMPT_RCU=y
> > > CONFIG_HZ_PERIODIC=y
> > > CONFIG_NO_HZ_IDLE=n
> > > CONFIG_NO_HZ_FULL=n
> > > CONFIG_RCU_TRACE=y
> > > CONFIG_HOTPLUG_CPU=y
> > > CONFIG_RCU_FANOUT=2
> > > CONFIG_RCU_FANOUT_LEAF=2
> > > CONFIG_RCU_NOCB_CPU=n
> > > CONFIG_DEBUG_LOCK_ALLOC=n
> > > CONFIG_RCU_BOOST=y
> > > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> > > CONFIG_RCU_EXPERT=y
> > 
> > You could enable CONFIG_PREEMPT_RT ;)
> > CONFIG_PREEMPT_LAZY is probably also set a lot.
> > 
> > That should be it.
> > 
> > > ------------------------------------------------------------------------
> > > 
> > > And the following kernel-boot parameters:
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> > > rcutree.gp_preinit_delay=12
> > > rcutree.gp_init_delay=3
> > > rcutree.gp_cleanup_delay=3
> > > rcutree.kthread_prio=2
> > > threadirqs
> > > rcutree.use_softirq=0
> > > rcutorture.preempt_duration=10
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > Some of these are for RCU's benefit, but what should I change to more
> > > closely approximate a typical real-time deployment?
> > 
> > See above.
> 
> Which got me this diff:
> 
> ------------------------------------------------------------------------
> 
> diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
> index 2dc31b16e506..6158f5002497 100644
> --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03
> +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03
> @@ -2,7 +2,9 @@ CONFIG_SMP=y
>  CONFIG_NR_CPUS=16
>  CONFIG_PREEMPT_NONE=n
>  CONFIG_PREEMPT_VOLUNTARY=n
> -CONFIG_PREEMPT=y
> +CONFIG_PREEMPT=n
> +CONFIG_PREEMPT_LAZY=y
> +CONFIG_PREEMPT_RT=y
>  #CHECK#CONFIG_PREEMPT_RCU=y
>  CONFIG_HZ_PERIODIC=y
>  CONFIG_NO_HZ_IDLE=n
> @@ -15,4 +17,5 @@ CONFIG_RCU_NOCB_CPU=n
>  CONFIG_DEBUG_LOCK_ALLOC=n
>  CONFIG_RCU_BOOST=y
>  CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
> +CONFIG_EXPERT=y
>  CONFIG_RCU_EXPERT=y
> 
> ------------------------------------------------------------------------
> 
> But a 10-minute run got me the splat shown below, and in addition a
> shutdown-time hang.
> 
> This is caused by RCU falling behind a callback-flooding kthread that
> invokes call_rcu() in a semi-tight loop.  Setting rcutree.kthread_prio=40
> avoids the splat, but still gets the shutdown-time hang.  Retrying with
> the default rcutree.kthread_prio=2 failed to reproduce the splat, but
> it did reproduce the shutdown-time hang.
> 
> OK, maybe printk buffers are not being flushed?  A 100-millisecond sleep
> at the end of of rcu_torture_cleanup() got all of rcutorture's output
> flushed, but lost the subsequent shutdown-time console traffic.  The
> pr_flush(HZ/10,1) seems more sensible, but this is private to printk().
> 
> I would like to log the shutdown-time console traffic because RCU can
> sometimes break things on that path.
> 
> Thoughts?

Longer rcutorture runs showed (not unexpectedly) that the 100-millisecond
sleep was not always sufficient, nor was a 500-milliseconds sleep.

There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off()
that appears to be intended to dump out the printk() buffers, but it
does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y.
Does there need to be a pr_flush() call prior to the call to
migrate_to_reboot_cpu()?  Or maybe even to do_kernel_power_off_prepare()
or kernel_shutdown_prepare()?

Adding John Ogness on CC so that he can tell me the error of my ways.

> PS:  I will do longer runs in case that splat was not a one-off.
>      My concern is that I might need to adjust something more in order
>      to get a reliable callback-flooding test.

And this was not a one-off.  Running 10 40-minute instances of the new-age
CONFIG_PREEMPT_RT=y TREE03 reliably triggers this.  At first glance,
this appears to be an interaction between testing of RCU priority
boosting and RCU-callback flooding forward-progress testing.  And disabling
testing of RCU priority boosting avoids these OOMs.  As does running
without CONFIG_PREEMPT_RT=y.

My next step is to run with rcutorture.preempt_duration=0, which disables
within-guest-OS random preempting of kthreads.  If that doesn't help,
I expect to play around with avoiding concurrent testing of RCU priority
boosting and RCU callback flooding forward progress.

Or is there a better way?

							Thanx, Paul