On Tue, Feb 04, 2025 at 12:20:30PM -0800, Paul E. McKenney wrote: > On Tue, Feb 04, 2025 at 05:34:09PM +0100, Sebastian Andrzej Siewior wrote: > > On 2025-02-04 03:51:48 [-0800], Paul E. McKenney wrote: > > > On Tue, Feb 04, 2025 at 11:26:11AM +0100, Sebastian Andrzej Siewior wrote: > > > > On 2025-01-30 10:53:19 [-0800], Paul E. McKenney wrote: > > > > > The timer and hrtimer softirq processing has moved to dedicated threads > > > > > for kernels built with CONFIG_IRQ_FORCED_THREADING=y. This results in > > > > > timers not expiring until later in early boot, which in turn causes the > > > > > RCU Tasks self-tests to hang in kernels built with CONFIG_PROVE_RCU=y, > > > > > which further causes the entire kernel to hang. One fix would be to > > > > > make timers work during this time, but there are no known users of RCU > > > > > Tasks grace periods during that time, so no justification for the added > > > > > complexity. Not yet, anyway. > > > > > > > > > > This commit therefore moves the call to rcu_init_tasks_generic() from > > > > > kernel_init_freeable() to a core_initcall(). This works because the > > > > > timer and hrtimer kthreads are created at early_initcall() time. > > > > > > > > Fixes: 49a17639508c3 ("softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.") > > > > ? > > > > > > Quite possibly... I freely confess that I was more focused on the fix > > > than on the bug's origin. Would you be willing to try this commit and > > > its predecessor? > > > > Yes. Just verified. > > Tested-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > > Boqun, could you please apply Sebastian's tags, including the Fixes > tag above? > > > > > I played with it and I can reproduce the issue with !RT + threadirqs but > > > > not with RT (which implies threadirqs). > > > > Is there anything in RT that avoids the problem? > > > > > > Not that I know of, but then again I did not try it. To your point, > > > > The change looks fine. > > > > > I do need to make a -rt rcutorture scenario. TREE03 has been intended to > > > approximate this, and it uses the following Kconfig options: > > > > > > ------------------------------------------------------------------------ > > > > > > CONFIG_SMP=y > > > CONFIG_NR_CPUS=16 > > > CONFIG_PREEMPT_NONE=n > > > CONFIG_PREEMPT_VOLUNTARY=n > > > CONFIG_PREEMPT=y > > > #CHECK#CONFIG_PREEMPT_RCU=y > > > CONFIG_HZ_PERIODIC=y > > > CONFIG_NO_HZ_IDLE=n > > > CONFIG_NO_HZ_FULL=n > > > CONFIG_RCU_TRACE=y > > > CONFIG_HOTPLUG_CPU=y > > > CONFIG_RCU_FANOUT=2 > > > CONFIG_RCU_FANOUT_LEAF=2 > > > CONFIG_RCU_NOCB_CPU=n > > > CONFIG_DEBUG_LOCK_ALLOC=n > > > CONFIG_RCU_BOOST=y > > > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > > > CONFIG_RCU_EXPERT=y > > > > You could enable CONFIG_PREEMPT_RT ;) > > CONFIG_PREEMPT_LAZY is probably also set a lot. > > > > That should be it. > > > > > ------------------------------------------------------------------------ > > > > > > And the following kernel-boot parameters: > > > > > > ------------------------------------------------------------------------ > > > > > > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 > > > rcutree.gp_preinit_delay=12 > > > rcutree.gp_init_delay=3 > > > rcutree.gp_cleanup_delay=3 > > > rcutree.kthread_prio=2 > > > threadirqs > > > rcutree.use_softirq=0 > > > rcutorture.preempt_duration=10 > > > > > > ------------------------------------------------------------------------ > > > > > > Some of these are for RCU's benefit, but what should I change to more > > > closely approximate a typical real-time deployment? > > > > See above. > > Which got me this diff: > > ------------------------------------------------------------------------ > > diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 > index 2dc31b16e506..6158f5002497 100644 > --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE03 > +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE03 > @@ -2,7 +2,9 @@ CONFIG_SMP=y > CONFIG_NR_CPUS=16 > CONFIG_PREEMPT_NONE=n > CONFIG_PREEMPT_VOLUNTARY=n > -CONFIG_PREEMPT=y > +CONFIG_PREEMPT=n > +CONFIG_PREEMPT_LAZY=y > +CONFIG_PREEMPT_RT=y > #CHECK#CONFIG_PREEMPT_RCU=y > CONFIG_HZ_PERIODIC=y > CONFIG_NO_HZ_IDLE=n > @@ -15,4 +17,5 @@ CONFIG_RCU_NOCB_CPU=n > CONFIG_DEBUG_LOCK_ALLOC=n > CONFIG_RCU_BOOST=y > CONFIG_DEBUG_OBJECTS_RCU_HEAD=n > +CONFIG_EXPERT=y > CONFIG_RCU_EXPERT=y > > ------------------------------------------------------------------------ > > But a 10-minute run got me the splat shown below, and in addition a > shutdown-time hang. > > This is caused by RCU falling behind a callback-flooding kthread that > invokes call_rcu() in a semi-tight loop. Setting rcutree.kthread_prio=40 > avoids the splat, but still gets the shutdown-time hang. Retrying with > the default rcutree.kthread_prio=2 failed to reproduce the splat, but > it did reproduce the shutdown-time hang. > > OK, maybe printk buffers are not being flushed? A 100-millisecond sleep > at the end of of rcu_torture_cleanup() got all of rcutorture's output > flushed, but lost the subsequent shutdown-time console traffic. The > pr_flush(HZ/10,1) seems more sensible, but this is private to printk(). > > I would like to log the shutdown-time console traffic because RCU can > sometimes break things on that path. > > Thoughts? Longer rcutorture runs showed (not unexpectedly) that the 100-millisecond sleep was not always sufficient, nor was a 500-milliseconds sleep. There is a call to kmsg_dump(KMSG_DUMP_SHUTDOWN) in kernel_power_off() that appears to be intended to dump out the printk() buffers, but it does not seem to do so in kernels built with CONFIG_PREEMPT_RT=y. Does there need to be a pr_flush() call prior to the call to migrate_to_reboot_cpu()? Or maybe even to do_kernel_power_off_prepare() or kernel_shutdown_prepare()? Adding John Ogness on CC so that he can tell me the error of my ways. > PS: I will do longer runs in case that splat was not a one-off. > My concern is that I might need to adjust something more in order > to get a reliable callback-flooding test. And this was not a one-off. Running 10 40-minute instances of the new-age CONFIG_PREEMPT_RT=y TREE03 reliably triggers this. At first glance, this appears to be an interaction between testing of RCU priority boosting and RCU-callback flooding forward-progress testing. And disabling testing of RCU priority boosting avoids these OOMs. As does running without CONFIG_PREEMPT_RT=y. My next step is to run with rcutorture.preempt_duration=0, which disables within-guest-OS random preempting of kthreads. If that doesn't help, I expect to play around with avoiding concurrent testing of RCU priority boosting and RCU callback flooding forward progress. Or is there a better way? Thanx, Paul