Re: Reproducing call_rcu_tasks self-test bug on a dual sockets Intel Xeon machine

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Tue, 3 May 2022 07:40:24 -0700

On Tue, May 03, 2022 at 08:53:47AM +0800, Zhouyi Zhou wrote:
> Hi Paul
> Thanks for your guidance!

And thank you for persisting on this!

> On Sat, Apr 23, 2022 at 1:35 AM Zhouyi Zhou <zhouzhouyi@xxxxxxxxx> wrote:
> >
> > On Sat, Apr 23, 2022 at 12:00 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Apr 22, 2022 at 01:28:51PM +0800, Zhouyi Zhou wrote:
> > > > Hi Paul
> > > >
> > > > On Wed, Apr 20, 2022 at 7:04 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Apr 18, 2022 at 11:16:16AM +0800, Zhouyi Zhou wrote:
> > > > > > 1. Abstract
> > > > > > Previously, Matthew has reported a call_rcu_tasks bug [1]. But it is
> > > > > > hard to reproduce on a dual socket Intel machine. This report
> > > > > > developed a method to reproduce that bug on such a machine.
> > > > > >
> > > > > > 2. Introduction
> > > > > > Matthew has reported a call_rcu_tasks bug [1]. It is easy to reproduce
> > > > > > it on Thinkpad P1 gen4 (Intel i7-11800H) with following steps:
> > > > > > a) cp [2] to .config
> > > > > > b) build the kernel
> > > > > > c) execute  kvm -smp 4 -net none -serial file:/tmp/console.log -m 512
> > > > > > -kernel vmlinux -append "console=ttyS0"
> > > > > >
> > > > > > The bug can even be reproduced on Thinkpad P1 gen4 (i7-11800H) by
> > > > > > stress testing the TASKS01 with the script [3].
> > > > > >
> > > > > > However, the bug can't be reproduced by above methods on a dual socket
> > > > > > Intel Xeon machine (PowerEdge R720, Intel(R) Xeon(R) CPU E5-2660 0 @
> > > > > > 2.20GHz, 2 CPUs, 16 cores, 32 threads).
> > > > > >
> > > > > > 3. Discover the reason behind the bug
> > > > > > On thinkpad P1 gen4, I tried to discover the reason behind the bug.
> > > > > > After some experimentation, I found that removing CONFIG_NFS_V4 from
> > > > > > the kernel config file will greatly increase the probabilities of
> > > > > > triggering the bug.
> > > > > >
> > > > > > I guess maybe the init calls lend the grace period to RCU tasks self
> > > > > > tests, so I add some debug code to a test kernel to see what happens:
> > > > > > root@e0816662e3db:/mnt/linux# git diff
> > > > > > diff --git a/init/main.c b/init/main.c
> > > > > > index 98182c3c2c4b..9731566014cc 100644
> > > > > > --- a/init/main.c
> > > > > > +++ b/init/main.c
> > > > > > @@ -1356,6 +1356,8 @@ static int __init ignore_unknown_bootoption(char
> > > > > > *param, char *val,
> > > > > >         return 0;
> > > > > >  }
> > > > > >
> > > > > > +extern bool *notrunp;
> > > > > > +unsigned int numofcalls = 0;
> > > > > >  static void __init do_initcall_level(int level, char *command_line)
> > > > > >  {
> > > > > >         initcall_entry_t *fn;
> > > > > > @@ -1367,8 +1369,19 @@ static void __init do_initcall_level(int level,
> > > > > > char *command_line)
> > > > > >                    NULL, ignore_unknown_bootoption);
> > > > > >
> > > > > >         trace_initcall_level(initcall_level_names[level]);
> > > > > > -       for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
> > > > > > +       for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++) {
> > > > > > +               bool before = *notrunp, after;
> > > > > > +               unsigned long jiffies_save = jiffies;
> > > > > > +               numofcalls++;
> > > > > >                 do_one_initcall(initcall_from_entry(fn));
> > > > > > +               after = *notrunp;
> > > > > > +               if (before != after) {
> > > > > > +                 char sym[KSYM_SYMBOL_LEN];
> > > > > > +                 sprint_symbol(sym, (unsigned long )initcall_from_entry(fn));
> > > > > > +                 printk(KERN_INFO "init call level %d: %s changed the
> > > > > > rcu task test %lu\n",
> > > > > > +                        level, sym, jiffies - jiffies_save);
> > > > > > +               }
> > > > > > +       }
> > > > > >  }
> > > > > >
> > > > > >  static void __init do_initcalls(void)
> > > > > > @@ -1376,15 +1389,17 @@ static void __init do_initcalls(void)
> > > > > >         int level;
> > > > > >         size_t len = strlen(saved_command_line) + 1;
> > > > > >         char *command_line;
> > > > > > -
> > > > > > +       unsigned long jiffies_save = jiffies;
> > > > > >         command_line = kzalloc(len, GFP_KERNEL);
> > > > > >         if (!command_line)
> > > > > >                 panic("%s: Failed to allocate %zu bytes\n", __func__, len);
> > > > > > -
> > > > > > +
> > > > > >         for (level = 0; level < ARRAY_SIZE(initcall_levels) - 1; level++) {
> > > > > >                 /* Parser modifies command_line, restore it each time */
> > > > > >                 strcpy(command_line, saved_command_line);
> > > > > >                 do_initcall_level(level, command_line);
> > > > > > +               if (level == 5)
> > > > > > +                 printk(KERN_INFO"init call level 5 costs: %lu
> > > > > > numofcalls %u\n", jiffies - jiffies_save, numofcalls);
> > > > > >         }
> > > > > >
> > > > > >         kfree(command_line);
> > > > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > > > > index 99cf3a13954c..f4e175eeaec3 100644
> > > > > > --- a/kernel/rcu/tasks.h
> > > > > > +++ b/kernel/rcu/tasks.h
> > > > > > @@ -1660,7 +1660,7 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp)
> > > > > >
> > > > > >         rttd->notrun = true;
> > > > > >  }
> > > > > > -
> > > > > > +bool *notrunp = &tests[0].notrun;
> > > > > >  static void rcu_tasks_initiate_self_tests(void)
> > > > > >  {
> > > > > >         pr_info("Running RCU-tasks wait API self tests\n");
> > > > > >
> > > > > > The tests on Thinkpad P1 gen4 and Dell PowerEdge R720 show that during
> > > > > > the test, some init call will lend the grace period to the RCU tasks
> > > > > > self tests.
> > > > > >
> > > > > > And of course, I am not using the above modification in Section 5 and
> > > > > > Section 6' tests. Section 5 and Section 6's tests are performed on
> > > > > > unmodified mainline!!
> > > > > >
> > > > > > 4. Reduce the init calls in kernel configuration
> > > > > > Based on above observation, I tried to reduce the kernel configuration
> > > > > > starting from [2] meanwhile keep the qemu bootable. Finally I got
> > > > > > [4].
> > > > > > And [4] will trigger the bug on Dell PowerEdge R720!
> > > > > >
> > > > > > 5. Trigger the bug in a more Linux way.
> > > > > > Why TASKS01 will not trigger the bug? There are two reasons:
> > > > > > a) there are too many init calls that will lend the grace period
> > > > > > b) the debug kernel param will lend the grace period
> > > > > >
> > > > > > Solution to reason a:
> > > > > > I write a C++ program [5] to print the difference between the .config
> > > > > > generated by TASKS01 and [4]. Then I use the difference to create a
> > > > > > TASKS04.
> > > > > >
> > > > > > Solution to reason b:
> > > > > > I add loglevel=7 to qemu boot parameter.
> > > > > >
> > > > > > The result consists of newly created TASKS04 and TASKS04.boot [6].
> > > > > >
> > > > > > Now the bug can trigger every time on Dell PowerEdge R720 by invoking:
> > > > > > ./tools/testing/selftests/rcutorture/bin/kvm.sh --configs "TASKS04"
> > > > > > --duration 10 --trust-make
> > > > > >
> > > > > > 6. Conclusion
> > > > > > I am able to trigger the bug everytime in mainline without modifying
> > > > > > the kernel's functional code on a dual socket Intel Xeon machine.
> > > > > >
> > > > > > 7. Acknowledgement
> > > > > > Thanks Paul for your constant encouragement! Thanks Zhouyi
> > > > >
> > > > > And thank you for digging into this!
> > > > You are very welcome ;-)
> > > > >
> > > > References
> > > > [1] https://lore.kernel.org/rcu/20220322142101.GQ4285@paulmck-ThinkPad-P17-Gen-1/T/#mb696465963393bbc3d38e85ffef0e03cf1e4fa54
> > > > [2] http://154.223.142.244/20220321/config-20220321
> > > > [3] http://154.223.142.244/20220418/test.sh.txt
> > > > [4] http://154.223.142.244/20220418/config
> > > > [5] http://154.223.142.244/20220418/processconfig.cpp
> > > > [6] http://154.223.142.244/20220418/0001-RCU.patch
> > > > > Here is a sneak preview of a lightly tested alleged fix, which may be
> > > > > found on the -rcu tree's "dev" branch:
> > > > >
> > > > > a3b1f9e07d47 rcu-tasks: Split rcu_tasks_one_gp() from rcu_tasks_kthread()
> > > > > 130dd60ddd96 rcu-tasks: Move synchronize_rcu_tasks_generic() down
> > > > > 7e510e661ea5 rcu-tasks: Drive synchronous grace periods from calling task
> > > > >
> > > > > Note that these are in the order that they should be applied rather than
> > > > > in git-log order.
> > > > I have been testing the above patches these days, on my Dell Poweredge
> > > > R720, the bug seems to be gone. But on my Thinkpad P1 gen4, the bug
> > > > seems to be there still.  Sorry to report that bad news.
> > >
> > > Hey, there is some improvement, so that counts as good news!  Besides,
> > > it is what it is.  So thank you for testing and for letting me know both
> > > the good and the not-quite-so-good news.  ;-)
> > Yes, there is improvement, without your patch, the guest OS in my Dell
> > R720 server will certainly issue the warnings at the first boot, with
> > your patch, the guest OS booted a thousand times without warning ;-) I
> > am very glad to be of some help ;-) You are very welcome, and thank
> > Paul for your encouragement !
> > >
> > > > I will study your patches carefully and double check my test
> > > > environment in the coming days.
> > >
> > > Could you please trace the series of calls to these functions up to
> > > the faiure?
> > >
> > >         synchronize_rcu_tasks()
> > >         call_rcu_tasks()
> > >         synchronize_rcu_tasks_rude()
> > >         call_rcu_tasks_rude()
> > >         synchronize_rcu_tasks_trace()
> > >         call_rcu_tasks_trace()
> > >         synchronize_rcu_tasks_generic()
> > >         rcu_tasks_one_gp()
> I added printk statement at the beginning of these functions, and
> print the holdout tasks in rcu_tasks_verify_self_tests, and print the
> backtrace of rcu task kthread at moment of fail:
> http://154.220.3.115/May022022/hook.diff
> 
> Then I run kvm test on my Thinkpad P1 gen 4 (Intel i7-11800H):
> http://154.220.3.115/May022022/console.log
> 
> >From above log, I see
> [    0.439296]  <TASK>
> [    0.439420]  __schedule+0x246/0x7b0
> [    0.439625]  ? mark_held_locks+0x49/0x70
> [    0.439852]  schedule+0x53/0xc0
> [    0.440035]  schedule_timeout+0x1a5/0x310
> [    0.440268]  ? internal_add_timer+0x30/0x30
> [    0.440516]  rcu_tasks_wait_gp+0x22e/0x340
> [    0.440896]  rcu_tasks_one_gp+0x336/0x3b6
> [    0.441209]  ? rcu_tasks_postscan+0x10/0x10
> [    0.441447]  rcu_tasks_kthread.cold+0xa/0x18
> [    0.441699]  kthread+0xe0/0x110
> [    0.441883]  ? kthread_exit+0x30/0x30
> [    0.442098]  ret_from_fork+0x1f/0x30
> [    0.442307]  </TASK>
> 
> So, it is rcu task kthread's sleeping (schedule_timeout_idle(fract))
> that prevents the holdout tasks being harvested.

Nice!

So the machine hitting this was doing so because it was booting too
fast.  ;-)

> Then I tried to reduce value of rtp->init_fract before
> rcu_tasks_verify_self_tests and recover the original value after
> rcu_tasks_verify_self_tests:
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index bd9f2e24f5c7..29c4a4a8fb51 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -963,7 +963,11 @@ static int __init rcu_spawn_tasks_kthread(void)
>  {
>         cblist_init_generic(&rcu_tasks);
>         rcu_tasks.gp_sleep = HZ / 10;
> +#ifdef CONFIG_PROVE_RCU
> +       rcu_tasks.init_fract = HZ / 100;
> +#else
>         rcu_tasks.init_fract = HZ / 10;
> +#endif

You could save a few lines as follows:

	rcu_tasks.init_fract = HZ / (IS_ENABLED(CONFIG_PROVE_RCU) ? 100 : 10)

>         rcu_tasks.pregp_func = rcu_tasks_pregp_step;
>         rcu_tasks.pertask_func = rcu_tasks_pertask;
>         rcu_tasks.postscan_func = rcu_tasks_postscan;
> @@ -1777,6 +1781,7 @@ static int rcu_tasks_verify_self_tests(void)
>         if (ret)
>                 WARN_ON(1);
> 
> +       rcu_tasks.init_fract = HZ / 10;

Another way of handling this would be to leave rcu_spawn_tasks_kthread()
alone and add this to the top of rcu_tasks_verify_self_tests():

	if (IS_ENABLED(CONFIG_PROVE_RCU))
		schedule_timeout_uninterruptible(2 * rcu_tasks.init_fract);

Then you would not need this assignment to rcu_tasks.init_fract.

But you would also need something to handle the case of kernels built
with CONFIG_TASKS_RCU=n.  For example and accessor function that
returned the value of rcu_tasks.init_fract when CONFIG_TASKS_RCU=y and
zero otherwise.  (And add this to the "if" condition if a zero argument
to the schedule_timeout_uninterruptible() function still sleeps.)

Or am I missing something in your diagnosis of this problem?

>         return ret;
>  }
>  late_initcall(rcu_tasks_verify_self_tests);
> 
> After patching above, the warning didn't reappear when booting the
> kernel for 1000 times on my Thinkpad P1 gen4 (without the patch, the
> warning will show at the first boot).
> 
> Do my above discovery worth of a formal patch ;-) because I am afraid
> adding above operations may slow the kernel boot process a little bit
> (I have not tested the exact slowdown, just a guess). Also, did my
> modification also apply to rcu_tasks_rude and rcu_tasks_trace? (I am
> going to study rcu_tasks_rude and rcu_tasks_trace in the following
> days ;-))

We do need a fix here because we don't want to punish people who make
Linux boot faster.  ;-)

The trick is making the fix operate correctly in all Kconfig combinations
and also making sure that it does not slow down boot unnecessarily.

							Thanx, Paul

> Thanks a lot
> Cheers
> Zhouyi
> > I will do it! This itself is such a valuable process of learning and
> > self promoting which I can't obtain anywhere else by any means to be
> > honest.
> > >
> > > One possibility is that the RCU tasks GP kthreads are not getting
> > > scheduled during that time on your Thinkpad.  If so, something
> > > like schedule_timeout_uninterruptible(1) at the beginning of
> > > rcu_tasks_verify_self_tests() should avoid the failure.  (Not a proper
> > > fix, of course!  Just for testing.)
> > I will do the tests in the coming days.
> > >
> > > > I add Uladzislau, author of bfba7ed084f8a  rcu-tasks: Add RCU-tasks
> > > > self tests to cc list.
> > >
> > > Thank you!
> > >
> > > Huh.  With my patch, it should be possible to move the
> > > invocation of rcu_tasks_initiate_self_tests() to the beginning of
> > > rcu_init_tasks_generic().
> > I will also try that.
> >
> > Thank you for your patience!
> >
> > Thanks again
> > Zhouyi
> > >
> > >                                                         Thanx, Paul
> > >
> > > > Kind Regards
> > > > Sincerely yours
> > > > Zhouyi
> > > > >
> > > > >                                                         Thanx, Paul