On Wed, Jun 08, 2022 at 08:52:53AM +0800, Zhouyi Zhou wrote: > Hi Paul > Thank you for your constant guidance and encouragement! > On Wed, Jun 8, 2022 at 7:42 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > On Fri, May 27, 2022 at 08:39:09AM +0800, Zhouyi Zhou wrote: > > > Sometimes, the kernel will boot too fast for rcu_tasks_verify_self_tests > > > to have all required grace periods. > > > > > > Temporarily reduce rcu tasks kthread sleep time for PROVE_RCU to get all > > > required grace periods. > > > > > > Both this patch and her sister > > > "wait extra jiffies for rcu_tasks_verify_self_tests" > > > https://lore.kernel.org/rcu/20220517004522.25176-1-zhouzhouyi@xxxxxxxxx/ > > > have their short comings: > > > > > > 1) this patch don't slow down the Linux boot time but will increase > > > the energe consumption during the boot because of reduced sleep time. > > > > > > 2) "wait extra jiffies for rcu_tasks_verify_self_tests" may slow the boot > > > process but has not energe problems. > > > > > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > > Suggested-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > Tested-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > > > Signed-off-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > > > --- > > > Hi Paul > > > > > > I have proposed some possible solutions to fix rcu_tasks_verify_self_tests > > > failure. Both of them are not perfect. Only tries to break the ice, > > > hoping to arouse the attension ;-) > > > > First, please accept my apologies for the delay, and especially thank > > you for continuing to dig into this! > You are very welcome ;-) And thank you for your deep consideration! > I am currenlly doing research on torture.sh on linux-next these days > (both on my Dell PowerEdge R720 server and my Thinkpad P1 gen 4), it > seems > that the message generated by lock_torture_print_module_parms and > rcu_torture_print_module_parms do not reach uart_console_write which > will lead to test failure, > I think I am near the answer. > > > > Your approach is not at all at all bad, but it would be good to leave > > the underlying implementation alone if we can. One way to do this is to > > wait for up to a fixed period of time for the grace period to complete, > > for example, as shown in the patch below. > Your patch is fantastic indeed! It overcomes all the shortcomings of > my proposals! Why didn't I think of such method. There are a few advantages of my having been programming for almost fifty years. ;-) And to be fair to you, it did take me some time to think of it, which was one reason for the delay. > > Thoughts? > I have tested dev for 30 minutes (500 boots), none of them fail. > Then I revert this fix: > zzy@zzy-ThinkPad-P1-Gen-4i:~/Program/linux-rcu$ git checkout > 504312bb6d39c22d6d0415993c2f9af6ce2b2bba > Previous HEAD position was 3e95d4b287b3 rcu-tasks: Be more patient for > RCU Tasks boot-time testing > HEAD is now at 504312bb6d39 rcu-tasks: Update comments > my test script reports the failure on the first boot. > > Conclusion: > The patch below fixed the problem reported by Matthew and I elegantly. > > Tested-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> Thank you, and I will apply this on my next rebase. Thanx, Paul > Thanks > Zhouyi > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > commit 3e95d4b287b37ee5f7f82e5ebd749ab89fd706c2 > > Author: Paul E. McKenney <paulmck@xxxxxxxxxx> > > Date: Tue Jun 7 15:23:52 2022 -0700 > > > > rcu-tasks: Be more patient for RCU Tasks boot-time testing > > > > The RCU-Tasks family of grace-period primitives can take some time to > > complete, and the amount of time can depend on the exact hardware and > > software configuration. Some configurations boot up fast enough that the > > RCU-Tasks verification process gets false-positive failures. This commit > > therefore allows up to 30 seconds for the grace periods to complete, with > > this value adjustable downwards using the rcupdate.rcu_task_stall_timeout > > kernel boot parameter. > > > > Reported-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Reported-by: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h > > index f1209ce621c51..1a4c3adc5c397 100644 > > --- a/kernel/rcu/tasks.h > > +++ b/kernel/rcu/tasks.h > > @@ -145,6 +145,7 @@ static int rcu_task_ipi_delay __read_mostly = RCU_TASK_IPI_DELAY; > > module_param(rcu_task_ipi_delay, int, 0644); > > > > /* Control stall timeouts. Disable with <= 0, otherwise jiffies till stall. */ > > +#define RCU_TASK_BOOT_STALL_TIMEOUT (HZ * 30) > > #define RCU_TASK_STALL_TIMEOUT (HZ * 60 * 10) > > static int rcu_task_stall_timeout __read_mostly = RCU_TASK_STALL_TIMEOUT; > > module_param(rcu_task_stall_timeout, int, 0644); > > @@ -1778,23 +1779,24 @@ struct rcu_tasks_test_desc { > > struct rcu_head rh; > > const char *name; > > bool notrun; > > + unsigned long runstart; > > }; > > > > static struct rcu_tasks_test_desc tests[] = { > > { > > .name = "call_rcu_tasks()", > > /* If not defined, the test is skipped. */ > > - .notrun = !IS_ENABLED(CONFIG_TASKS_RCU), > > + .notrun = IS_ENABLED(CONFIG_TASKS_RCU), > > }, > > { > > .name = "call_rcu_tasks_rude()", > > /* If not defined, the test is skipped. */ > > - .notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU), > > + .notrun = IS_ENABLED(CONFIG_TASKS_RUDE_RCU), > > }, > > { > > .name = "call_rcu_tasks_trace()", > > /* If not defined, the test is skipped. */ > > - .notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU) > > + .notrun = IS_ENABLED(CONFIG_TASKS_TRACE_RCU) > > } > > }; > > > > @@ -1805,23 +1807,28 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp) > > > > pr_info("Callback from %s invoked.\n", rttd->name); > > > > - rttd->notrun = true; > > + rttd->notrun = false; > > } > > > > static void rcu_tasks_initiate_self_tests(void) > > { > > + unsigned long j = jiffies; > > + > > pr_info("Running RCU-tasks wait API self tests\n"); > > #ifdef CONFIG_TASKS_RCU > > + tests[0].runstart = j; > > synchronize_rcu_tasks(); > > call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback); > > #endif > > > > #ifdef CONFIG_TASKS_RUDE_RCU > > + tests[1].runstart = j; > > synchronize_rcu_tasks_rude(); > > call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback); > > #endif > > > > #ifdef CONFIG_TASKS_TRACE_RCU > > + tests[2].runstart = j; > > synchronize_rcu_tasks_trace(); > > call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback); > > #endif > > @@ -1831,11 +1838,18 @@ static int rcu_tasks_verify_self_tests(void) > > { > > int ret = 0; > > int i; > > + unsigned long bst = rcu_task_stall_timeout; > > > > + if (bst <= 0 || bst > RCU_TASK_BOOT_STALL_TIMEOUT) > > + bst = RCU_TASK_BOOT_STALL_TIMEOUT; > > for (i = 0; i < ARRAY_SIZE(tests); i++) { > > - if (!tests[i].notrun) { // still hanging. > > - pr_err("%s has been failed.\n", tests[i].name); > > - ret = -1; > > + while (tests[i].notrun) { // still hanging. > > + if (time_after(jiffies, tests[i].runstart + bst)) { > > + pr_err("%s has failed boot-time tests.\n", tests[i].name); > > + ret = -1; > > + break; > > + } > > + schedule_timeout_uninterruptible(1); > > } > > } > >