Re: [tip: locking/core] lockdep: Fix lockdep recursion

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Wed, 14 Oct 2020 20:41:28 -0700

On Wed, Oct 14, 2020 at 04:55:53PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 15, 2020 at 12:39:54AM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 14, 2020 at 03:11:52PM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 14, 2020 at 11:53:19PM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 14, 2020 at 11:34:05AM -0700, Paul E. McKenney wrote:
> > > > > commit 7deaa04b02298001426730ed0e6214ac20d1a1c1
> > > > > Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > > > Date:   Tue Oct 13 12:39:23 2020 -0700
> > > > > 
> > > > >     rcu: Prevent lockdep-RCU splats on lock acquisition/release
> > > > >     
> > > > >     The rcu_cpu_starting() and rcu_report_dead() functions transition the
> > > > >     current CPU between online and offline state from an RCU perspective.
> > > > >     Unfortunately, this means that the rcu_cpu_starting() function's lock
> > > > >     acquisition and the rcu_report_dead() function's lock releases happen
> > > > >     while the CPU is offline from an RCU perspective, which can result in
> > > > >     lockdep-RCU splats about using RCU from an offline CPU.  In reality,
> > > > >     aside from the splats, both transitions are safe because a new grace
> > > > >     period cannot start until these functions release their locks.
> > > > 
> > > > But we call the trace_* crud before we acquire the lock. Are you sure
> > > > that's a false-positive? 
> > > 
> > > You lost me on this one.
> > > 
> > > I am assuming that you are talking about rcu_cpu_starting(), because
> > > that is the one where RCU is not initially watching, that is, the
> > > case where tracing before the lock acquisition would be a problem.
> > > You cannot be talking about rcu_cpu_starting() itself, because it does
> > > not do any tracing before acquiring the lock.  But if you are talking
> > > about the caller of rcu_cpu_starting(), then that caller should put the
> > > rcu_cpu_starting() before the tracing.  But that would be the other
> > > patch earlier in this thread that was proposing moving the call to
> > > rcu_cpu_starting() much earlier in CPU bringup.
> > > 
> > > So what am I missing here?
> > 
> > rcu_cpu_starting();
> >   raw_spin_lock_irqsave();
> >     local_irq_save();
> >     preempt_disable();
> >     spin_acquire()
> >       lock_acquire()
> >         trace_lock_acquire() <--- *whoopsie-doodle*
> > 	  /* uses RCU for tracing */
> >     arch_spin_lock_flags() <--- the actual spinlock
> 
> Gah!  Idiot here left out the most important part, so good catch!!!
> Much easier this way than finding out about it the hard way...
> 
> I should have asked myself harder questions earlier today about moving
> the counter from the rcu_node structure to the rcu_data structure.
> 
> Perhaps something like the following untested patch on top of the
> earlier patch?

Except that this is subtlely flawed also.  The delay cannot be at
rcu_gp_cleanup() time because by the time we are working on the last
leaf rcu_node structure, callbacks might already have started being
invoked on CPUs corresponding to the earlier leaf rcu_node structures.

So the (untested) patch below (on top of the other two) moves the delay
to rcu_gp_init(), in particular, to the first loop that traverses only
the leaf rcu_node structures handling CPU hotplug.

Hopefully getting closer!

Oh, and the second smp_mb() added to rcu_gp_init() is probably
redundant given the full barrier implied by the later call to
raw_spin_lock_irq_rcu_node().  But one thing at a time...

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8b5215e..5904b63 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1725,6 +1725,7 @@ static void rcu_strict_gp_boundary(void *unused)
  */
 static bool rcu_gp_init(void)
 {
+	unsigned long firstseq;
 	unsigned long flags;
 	unsigned long oldmask;
 	unsigned long mask;
@@ -1768,6 +1769,12 @@ static bool rcu_gp_init(void)
 	 */
 	rcu_state.gp_state = RCU_GP_ONOFF;
 	rcu_for_each_leaf_node(rnp) {
+		smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
+		firstseq = READ_ONCE(rnp->ofl_seq);
+		if (firstseq & 0x1)
+			while (firstseq == smp_load_acquire(&rnp->ofl_seq))
+				schedule_timeout_idle(1);  // Can't wake unless RCU is watching.
+		smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
 		raw_spin_lock(&rcu_state.ofl_lock);
 		raw_spin_lock_irq_rcu_node(rnp);
 		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
@@ -1982,7 +1989,6 @@ static void rcu_gp_fqs_loop(void)
 static void rcu_gp_cleanup(void)
 {
 	int cpu;
-	unsigned long firstseq;
 	bool needgp = false;
 	unsigned long gp_duration;
 	unsigned long new_gp_seq;
@@ -2020,12 +2026,6 @@ static void rcu_gp_cleanup(void)
 	new_gp_seq = rcu_state.gp_seq;
 	rcu_seq_end(&new_gp_seq);
 	rcu_for_each_node_breadth_first(rnp) {
-		smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
-		firstseq = READ_ONCE(rnp->ofl_seq);
-		if (firstseq & 0x1)
-			while (firstseq == smp_load_acquire(&rnp->ofl_seq))
-				schedule_timeout_idle(1);  // Can't wake unless RCU is watching.
-		smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
 		raw_spin_lock_irq_rcu_node(rnp);
 		if (WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)))
 			dump_blkd_tasks(rnp, 10);