On Mon, Jun 13, 2022 at 12:01:24PM -0400, Waiman Long wrote: > On 6/10/22 16:58, Paul E. McKenney wrote: > > On Fri, Jun 10, 2022 at 02:42:12PM -0400, Waiman Long wrote: > > > Even though rcu_tasks selftest is initiated early in the boot process, > > > the verification done at late initcall time may not be late enough to > > > catch all the callbacks especially on systems with just a few cpus and > > > small memory. > > > > > > After 12 bootup's On a s390x system, 1 of them had failed rcu_tasks > > > verification test. > > > > > > [ 8.183013] call_rcu_tasks() has been failed. > > > [ 8.183041] WARNING: CPU: 0 PID: 1 at kernel/rcu/tasks.h:1696 rcu_tasks_verify_self_tests+0x64/0xd0 > > > [ 8.203246] Callback from call_rcu_tasks() invoked. > > > > > > In this particular case, the callback missed the check by about > > > 20ms. Similar rcu_tasks selftest failures are also seen in ppc64le > > > systems. > > > > > > [ 0.313391] call_rcu_tasks() has been failed. > > > [ 0.313407] WARNING: CPU: 0 PID: 1 at kernel/rcu/tasks.h:1696 rcu_tasks_verify_self_tests+0x5c/0xa0 > > > [ 0.335569] Callback from call_rcu_tasks() invoked. > > > > > > Avoid this missed callback by delaying the verification using > > > delayed_work. The delay is set to be about 0.1s which hopefully will > > > be long enough to catch all the callbacks on systems with few cpus and > > > small memory. > > > > > > Fixes: bfba7ed084f8 ("rcu-tasks: Add RCU-tasks self tests") > > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > > Good catch, thank you! > > > > A few days ago, I queued this: > > > > 2585014188d5 ("rcu-tasks: Be more patient for RCU Tasks boot-time testing") > > > > This is shown in full at the end of this email. Does this fix this > > problem for you? > > I think your patch should fix the false positive warning and it give plenty > of time for this to happen. > > I do have one question though. rcu_tasks_verify_selft_tests() is called from > do_initcalls(). Since it may not be the last late initcall, does that mean > other late initcalls queued after that may be delayed by a second or more? Indeed. Which is why I would welcome the workqueues portion of your patch on top of the above patch in -rcu. ;-) Thanx, Paul > > > --- > > > kernel/rcu/tasks.h | 19 ++++++++++++++++--- > > > 1 file changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > > index 3925e32159b5..25f964a671ba 100644s > > > --- a/kernel/rcu/tasks.h > > > +++ b/kernel/rcu/tasks.h > > > @@ -1735,7 +1735,7 @@ static void rcu_tasks_initiate_self_tests(void) > > > #endif > > > } > > > -static int rcu_tasks_verify_self_tests(void) > > > +static void rcu_tasks_verify_self_tests(struct work_struct *work __maybe_unused) > > > { > > > int ret = 0; > > > int i; > > > @@ -1749,10 +1749,23 @@ static int rcu_tasks_verify_self_tests(void) > > > if (ret) > > > WARN_ON(1); > > > +} > > > + > > > +static struct delayed_work rcu_tasks_verify_work; > > > - return ret; > > > +/* > > > + * The rcu_tasks verification is done indirectly via the work queue to > > > + * introduce an additional 0.1s delay to catch all the callbacks before > > > + * the verification is done as late_initcall time may not be late enough > > > + * to have all the callbacks fired. > > > + */ > > > +static int rcu_tasks_verify_schedule_work(void) > > > +{ > > > + INIT_DELAYED_WORK(&rcu_tasks_verify_work, rcu_tasks_verify_self_tests); > > > + schedule_delayed_work(&rcu_tasks_verify_work, HZ/10); > > > + return 0; > > > } > > > -late_initcall(rcu_tasks_verify_self_tests); > > > +late_initcall(rcu_tasks_verify_schedule_work); > > > #else /* #ifdef CONFIG_PROVE_RCU */ > > > static void rcu_tasks_initiate_self_tests(void) { } > > > #endif /* #else #ifdef CONFIG_PROVE_RCU */ > > > -- > > > 2.31.1 > > ------------------------------------------------------------------------ > > > > commit 2585014188d5e66052b4226b42602b6f3d921389 > > Author: Paul E. McKenney <paulmck@xxxxxxxxxx> > > Date: Tue Jun 7 15:23:52 2022 -0700 > > > > rcu-tasks: Be more patient for RCU Tasks boot-time testing > > The RCU-Tasks family of grace-period primitives can take some time to > > complete, and the amount of time can depend on the exact hardware and > > software configuration. Some configurations boot up fast enough that the > > RCU-Tasks verification process gets false-positive failures. This commit > > therefore allows up to 30 seconds for the grace periods to complete, with > > this value adjustable downwards using the rcupdate.rcu_task_stall_timeout > > kernel boot parameter. > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Reported-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > Tested-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > index df6b2cb2f205d..fcbd0ec33c866 100644 > > --- a/kernel/rcu/tasks.h > > +++ b/kernel/rcu/tasks.h > > @@ -145,6 +145,7 @@ static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY; > > module_param(rcu_task_ipi_delay, int, 0644); > > /* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */ > > +#define RCU_TASK_BOOT_STALL_TIMEOUT (HZ * 30) > > #define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10) > > static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT; > > module_param(rcu_task_stall_timeout, int, 0644); > > @@ -1776,23 +1777,24 @@ struct rcu_tasks_test_desc { > > struct rcu_head rh; > > const char *name; > > bool notrun; > > + unsigned long runstart; > > }; > > static struct rcu_tasks_test_desc tests[] = { > > { > > .name = "call_rcu_tasks()", > > /* If not defined, the test is skipped. */ > > - .notrun = !IS_ENABLED(CONFIG_TASKS_RCU), > > + .notrun = IS_ENABLED(CONFIG_TASKS_RCU), > > }, > > { > > .name = "call_rcu_tasks_rude()", > > /* If not defined, the test is skipped. */ > > - .notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU), > > + .notrun = IS_ENABLED(CONFIG_TASKS_RUDE_RCU), > > }, > > { > > .name = "call_rcu_tasks_trace()", > > /* If not defined, the test is skipped. */ > > - .notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU) > > + .notrun = IS_ENABLED(CONFIG_TASKS_TRACE_RCU) > > } > > }; > > @@ -1803,23 +1805,28 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp) > > pr_info("Callback from %s invoked.\n", rttd->name); > > - rttd->notrun = true; > > + rttd->notrun = false; > > } > > static void rcu_tasks_initiate_self_tests(void) > > { > > + unsigned long j = jiffies; > > + > > pr_info("Running RCU-tasks wait API self tests\n"); > > #ifdef CONFIG_TASKS_RCU > > + tests[0].runstart = j; > > synchronize_rcu_tasks(); > > call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback); > > #endif > > #ifdef CONFIG_TASKS_RUDE_RCU > > + tests[1].runstart = j; > > synchronize_rcu_tasks_rude(); > > call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback); > > #endif > > #ifdef CONFIG_TASKS_TRACE_RCU > > + tests[2].runstart = j; > > synchronize_rcu_tasks_trace(); > > call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback); > > #endif > > @@ -1829,11 +1836,18 @@ static int rcu_tasks_verify_self_tests(void) > > { > > int ret = 0; > > int i; > > + unsigned long bst = rcu_task_stall_timeout; > > + if (bst <= 0 || bst > RCU_TASK_BOOT_STALL_TIMEOUT) > > + bst = RCU_TASK_BOOT_STALL_TIMEOUT; > > for (i = 0; i < ARRAY_SIZE(tests); i++) { > > - if (!tests[i].notrun) { // still hanging. > > - pr_err("%s has been failed.\n", tests[i].name); > > - ret = -1; > > + while (tests[i].notrun) { // still hanging. > > + if (time_after(jiffies, tests[i].runstart + bst)) { > > + pr_err("%s has failed boot-time tests.\n", tests[i].name); > > + ret = -1; > > + break; > > + } > > + schedule_timeout_uninterruptible(1); > > } > > } > > >