Re: [PATCH] rcu: fix an infinite loop in rcu_gp_cleanup()

Qian Cai <cai@xxxxxx> · Mon, 16 Dec 2019 09:12:58 -0500

> On Dec 15, 2019, at 3:16 PM, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> 
> On Sun, Dec 15, 2019 at 01:52:42AM -0500, Qian Cai wrote:
>> The commit 82150cb53dcb ("rcu: React to callback overload by
>> aggressively seeking quiescent states") introduced an infinite loop
>> during boot here,
>> 
>> // Reset overload indication for CPUs no longer overloaded
>> for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
>> 	rdp = per_cpu_ptr(&rcu_data, cpu);
>> 	check_cb_ovld_locked(rdp, rnp);
>> }
>> 
>> because on an affected machine,
>> 
>> rnp->cbovldmask = 0
>> rnp->grphi = 127
>> rnp->grplo = 0
> 
> Your powerpc.config file from your first email shows:
> 
> 	rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128
> 
> So this should be the root rcu_node structure (as opposed to one of the
> eight leaf rcu_node structures, each of which should have the difference
> between rnp->grphi and rnp->grplo equal to 15).  And RCU should not be
> invoking for_each_leaf_node_cpu_mask() on this rcu_node structure.
> 
>> It ends up with "cpu" is always 64 and never be able to get out of the
>> loop due to "cpu <= rnp->grphi". It is pointless to enter the loop when
>> the cpumask is 0 as there is no CPU would be able to match it.
>> 
>> Fixes: 82150cb53dcb ("rcu: React to callback overload by aggressively seeking quiescent states")
>> Signed-off-by: Qian Cai <cai@xxxxxx>
>> ---
>> kernel/rcu/rcu.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
>> index ab504fbc76ca..fb691ec86df4 100644
>> --- a/kernel/rcu/rcu.h
>> +++ b/kernel/rcu/rcu.h
>> @@ -363,7 +363,7 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
>> 	((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
>> #define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
>> 	for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
>> -	     (cpu) <= rnp->grphi; \
>> +	     (cpu) <= rnp->grphi && (mask); \
>> 	     (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
> 
> This change cannot be right, but has to be one of the better bug reports
> I have ever received, so thank you very much, greatly appreciated!!!
> 
> So if I say the above change cannot be right, what change might work?
> 
> Again, it would certainly be a bug to invoke for_each_leaf_node_cpu_mask()
> on anything but one of the leaf rcu_node structures, as you might guess
> from the name.  And as noted above, your dump of the rcu_node fields
> above looks like it is exactly that bug that happened.  So let's review
> the uses of this macro:
> 
> kernel/rcu/tree.c:
> 
> o	rcu_gp_cleanup() invokes for_each_leaf_node_cpu_mask() within a
> 	srcu_for_each_node_breadth_first() loop, which includes non-leaf
> 	rcu_node structures.  This is a bug!  Please see patch below.
> 
> o	force_qs_rnp() is OK because for_each_leaf_node_cpu_mask() is
> 	invoked within a rcu_for_each_leaf_node() loop.
> 
> kernel/rcu/tree_exp.h:
> 
> o	rcu_report_exp_cpu_mult() invokes it on whatever rcu_node structure
> 	was passed in:
> 
> 	o	sync_rcu_exp_select_node_cpus() also relies on its
> 		caller (via workqueues) to do the right thing.
> 
> 		o	sync_rcu_exp_select_cpus() invokes
> 			sync_rcu_exp_select_node_cpus() from within a
> 			rcu_for_each_leaf_node() loop, so is OK.
> 
> 		o	sync_rcu_exp_select_cpus() also invokes
> 			sync_rcu_exp_select_node_cpus() indirectly
> 			via workqueues, but also from within the same
> 			rcu_for_each_leaf_node() loop, so is OK.
> 
> 	o	rcu_report_exp_rdp() invokes rcu_report_exp_cpu_mult()
> 		on the rcu_node structure corresponding to some
> 		specific CPU, which will always be a leaf rcu_node
> 		structure.
> 
> Again, thank you very much for your testing and debugging efforts!
> I have queued the three (almost untested) patches below, the first of
> which I will fold into the buggy "rcu: React to callback overload by
> aggressively seeking quiescent states" patch, the second of which I will
> apply to prevent future bugs of this sort, even when running on small
> systems, and the third of which will allow me to easily run rcutorture
> tests on the larger systems that I have recently gained easy access to.
> 
> And the big question...  Does the first patch clear up your hangs?  ;-)
> Either way, please let me know!
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit e8d6182b015bdd8221164477f4ab1c307bd2fbe9
> Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Date:   Sun Dec 15 10:59:06 2019 -0800
> 
>    squash! rcu: React to callback overload by aggressively seeking quiescent states
> 
>    [ paulmck: Fix bug located by Qian Cai. ]
>    Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 1d0935e..48fba22 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1760,10 +1760,11 @@ static void rcu_gp_cleanup(void)
> 		/* smp_mb() provided by prior unlock-lock pair. */
> 		needgp = rcu_future_gp_cleanup(rnp) || needgp;
> 		// Reset overload indication for CPUs no longer overloaded
> -		for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> -			rdp = per_cpu_ptr(&rcu_data, cpu);
> -			check_cb_ovld_locked(rdp, rnp);
> -		}
> +		if (rcu_is_leaf_node(rnp))
> +			for_each_leaf_node_cpu_mask(rnp, cpu, rnp->cbovldmask) {
> +				rdp = per_cpu_ptr(&rcu_data, cpu);
> +				check_cb_ovld_locked(rdp, rnp);
> +			}
> 		sq = rcu_nocb_gp_get(rnp);
> 		raw_spin_unlock_irq_rcu_node(rnp);
> 		rcu_nocb_gp_cleanup(sq);
> 

This works fine.

Tested-by: Qian Cai <cai@xxxxxx>