On Fri, May 27, 2022 at 08:39:09AM +0800, Zhouyi Zhou wrote: > Sometimes, the kernel will boot too fast for rcu_tasks_verify_self_tests > to have all required grace periods. > > Temporarily reduce rcu tasks kthread sleep time for PROVE_RCU to get all > required grace periods. > > Both this patch and her sister > "wait extra jiffies for rcu_tasks_verify_self_tests" > https://lore.kernel.org/rcu/20220517004522.25176-1-zhouzhouyi@xxxxxxxxx/ > have their short comings: > > 1) this patch don't slow down the Linux boot time but will increase > the energe consumption during the boot because of reduced sleep time. > > 2) "wait extra jiffies for rcu_tasks_verify_self_tests" may slow the boot > process but has not energe problems. > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > Tested-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > Signed-off-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > --- > Hi Paul > > I have proposed some possible solutions to fix rcu_tasks_verify_self_tests > failure. Both of them are not perfect. Only tries to break the ice, > hoping to arouse the attension ;-) First, please accept my apologies for the delay, and especially thank you for continuing to dig into this! Your approach is not at all at all bad, but it would be good to leave the underlying implementation alone if we can. One way to do this is to wait for up to a fixed period of time for the grace period to complete, for example, as shown in the patch below. Thoughts? Thanx, Paul ------------------------------------------------------------------------ commit 3e95d4b287b37ee5f7f82e5ebd749ab89fd706c2 Author: Paul E. McKenney <paulmck@xxxxxxxxxx> Date: Tue Jun 7 15:23:52 2022 -0700 rcu-tasks: Be more patient for RCU Tasks boot-time testing The RCU-Tasks family of grace-period primitives can take some time to complete, and the amount of time can depend on the exact hardware and software configuration. Some configurations boot up fast enough that the RCU-Tasks verification process gets false-positive failures. This commit therefore allows up to 30 seconds for the grace periods to complete, with this value adjustable downwards using the rcupdate.rcu_task_stall_timeout kernel boot parameter. Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> Reported-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index f1209ce621c51..1a4c3adc5c397 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -145,6 +145,7 @@ static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY; module_param(rcu_task_ipi_delay, int, 0644); /* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */ +#define RCU_TASK_BOOT_STALL_TIMEOUT (HZ * 30) #define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10) static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT; module_param(rcu_task_stall_timeout, int, 0644); @@ -1778,23 +1779,24 @@ struct rcu_tasks_test_desc { struct rcu_head rh; const char *name; bool notrun; + unsigned long runstart; }; static struct rcu_tasks_test_desc tests[] = { { .name = "call_rcu_tasks()", /* If not defined, the test is skipped. */ - .notrun = !IS_ENABLED(CONFIG_TASKS_RCU), + .notrun = IS_ENABLED(CONFIG_TASKS_RCU), }, { .name = "call_rcu_tasks_rude()", /* If not defined, the test is skipped. */ - .notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU), + .notrun = IS_ENABLED(CONFIG_TASKS_RUDE_RCU), }, { .name = "call_rcu_tasks_trace()", /* If not defined, the test is skipped. */ - .notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU) + .notrun = IS_ENABLED(CONFIG_TASKS_TRACE_RCU) } }; @@ -1805,23 +1807,28 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp) pr_info("Callback from %s invoked.\n", rttd->name); - rttd->notrun = true; + rttd->notrun = false; } static void rcu_tasks_initiate_self_tests(void) { + unsigned long j = jiffies; + pr_info("Running RCU-tasks wait API self tests\n"); #ifdef CONFIG_TASKS_RCU + tests[0].runstart = j; synchronize_rcu_tasks(); call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback); #endif #ifdef CONFIG_TASKS_RUDE_RCU + tests[1].runstart = j; synchronize_rcu_tasks_rude(); call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback); #endif #ifdef CONFIG_TASKS_TRACE_RCU + tests[2].runstart = j; synchronize_rcu_tasks_trace(); call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback); #endif @@ -1831,11 +1838,18 @@ static int rcu_tasks_verify_self_tests(void) { int ret = 0; int i; + unsigned long bst = rcu_task_stall_timeout; + if (bst <= 0 || bst > RCU_TASK_BOOT_STALL_TIMEOUT) + bst = RCU_TASK_BOOT_STALL_TIMEOUT; for (i = 0; i < ARRAY_SIZE(tests); i++) { - if (!tests[i].notrun) { // still hanging. - pr_err("%s has been failed.\n", tests[i].name); - ret = -1; + while (tests[i].notrun) { // still hanging. + if (time_after(jiffies, tests[i].runstart + bst)) { + pr_err("%s has failed boot-time tests.\n", tests[i].name); + ret = -1; + break; + } + schedule_timeout_uninterruptible(1); } }