> On Dec 15, 2019, at 3:16 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Sun, Dec 15, 2019 at 01:52:42AM -0500, Qian Cai wrote: >> The commit 82150cb53dcb ("rcu: React to callback overload by >> aggressively seeking quiescent states") introduced an infinite loop >> during boot here, >> >> // Reset overload indication for CPUs no longer overloaded >> for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) { >> rdp = per_cpu_ptr(&rcu_data, cpu); >> check_cb_ovld_locked(rdp, rnp); >> } >> >> because on an affected machine, >> >> rnp->cbovldmask = 0 >> rnp->grphi = 127 >> rnp->grplo = 0 > > Your powerpc.config file from your first email shows: > > rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128 > > So this should be the root rcu_node structure (as opposed to one of the > eight leaf rcu_node structures, each of which should have the difference > between rnp->grphi and rnp->grplo equal to 15). And RCU should not be > invoking for_each_leaf_node_cpu_mask() on this rcu_node structure. > >> It ends up with "cpu" is always 64 and never be able to get out of the >> loop due to "cpu <= rnp->grphi". It is pointless to enter the loop when >> the cpumask is 0 as there is no CPU would be able to match it. >> >> Fixes: 82150cb53dcb ("rcu: React to callback overload by aggressively seeking quiescent states") >> Signed-off-by: Qian Cai <cai@xxxxxx> >> --- >> kernel/rcu/rcu.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h >> index ab504fbc76ca..fb691ec86df4 100644 >> --- a/kernel/rcu/rcu.h >> +++ b/kernel/rcu/rcu.h >> @@ -363,7 +363,7 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt) >> ((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu))) >> #define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \ >> for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \ >> - (cpu) <= rnp->grphi; \ >> + (cpu) <= rnp->grphi && (mask); \ >> (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask))) > > This change cannot be right, but has to be one of the better bug reports > I have ever received, so thank you very much, greatly appreciated!!! > > So if I say the above change cannot be right, what change might work? > > Again, it would certainly be a bug to invoke for_each_leaf_node_cpu_mask() > on anything but one of the leaf rcu_node structures, as you might guess > from the name. And as noted above, your dump of the rcu_node fields > above looks like it is exactly that bug that happened. So let's review > the uses of this macro: > > kernel/rcu/tree.c: > > o rcu_gp_cleanup() invokes for_each_leaf_node_cpu_mask() within a > srcu_for_each_node_breadth_first() loop, which includes non-leaf > rcu_node structures. This is a bug! Please see patch below. > > o force_qs_rnp() is OK because for_each_leaf_node_cpu_mask() is > invoked within a rcu_for_each_leaf_node() loop. > > kernel/rcu/tree_exp.h: > > o rcu_report_exp_cpu_mult() invokes it on whatever rcu_node structure > was passed in: > > o sync_rcu_exp_select_node_cpus() also relies on its > caller (via workqueues) to do the right thing. > > o sync_rcu_exp_select_cpus() invokes > sync_rcu_exp_select_node_cpus() from within a > rcu_for_each_leaf_node() loop, so is OK. > > o sync_rcu_exp_select_cpus() also invokes > sync_rcu_exp_select_node_cpus() indirectly > via workqueues, but also from within the same > rcu_for_each_leaf_node() loop, so is OK. > > o rcu_report_exp_rdp() invokes rcu_report_exp_cpu_mult() > on the rcu_node structure corresponding to some > specific CPU, which will always be a leaf rcu_node > structure. > > Again, thank you very much for your testing and debugging efforts! > I have queued the three (almost untested) patches below, the first of > which I will fold into the buggy "rcu: React to callback overload by > aggressively seeking quiescent states" patch, the second of which I will > apply to prevent future bugs of this sort, even when running on small > systems, and the third of which will allow me to easily run rcutorture > tests on the larger systems that I have recently gained easy access to. > > And the big question... Does the first patch clear up your hangs? ;-) > Either way, please let me know! > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit e8d6182b015bdd8221164477f4ab1c307bd2fbe9 > Author: Paul E. McKenney <paulmck@xxxxxxxxxx> > Date: Sun Dec 15 10:59:06 2019 -0800 > > squash! rcu: React to callback overload by aggressively seeking quiescent states > > [ paulmck: Fix bug located by Qian Cai. ] > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 1d0935e..48fba22 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1760,10 +1760,11 @@ static void rcu_gp_cleanup(void) > /* smp_mb() provided by prior unlock-lock pair. */ > needgp = rcu_future_gp_cleanup(rnp) || needgp; > // Reset overload indication for CPUs no longer overloaded > - for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) { > - rdp = per_cpu_ptr(&rcu_data, cpu); > - check_cb_ovld_locked(rdp, rnp); > - } > + if (rcu_is_leaf_node(rnp)) > + for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) { > + rdp = per_cpu_ptr(&rcu_data, cpu); > + check_cb_ovld_locked(rdp, rnp); > + } > sq = rcu_nocb_gp_get(rnp); > raw_spin_unlock_irq_rcu_node(rnp); > rcu_nocb_gp_cleanup(sq); > This works fine. Tested-by: Qian Cai <cai@xxxxxx>