Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

Boqun Feng <boqun.feng@xxxxxxxxx> · Fri, 13 Jan 2023 23:32:01 -0800

On Sat, Jan 14, 2023 at 03:18:32PM +0800, Hillf Danton wrote:
> On Fri, 13 Jan 2023 16:17:59 -0800 Boqun Feng <boqun.feng@xxxxxxxxx>
> > On Sat, Jan 14, 2023 at 07:58:09AM +0800, Hillf Danton wrote:
> > > On 13 Jan 2023 09:58:10 -0800 Boqun Feng <boqun.feng@xxxxxxxxx>
> > > > On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> > > > > On 12 Jan 2023 22:59:54 -0800 Boqun Feng <boqun.feng@xxxxxxxxx>
> > > > > > --- a/kernel/rcu/srcutree.c
> > > > > > +++ b/kernel/rcu/srcutree.c
> > > > > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > > > > >  {
> > > > > >  	struct rcu_synchronize rcu;
> > > > > >  
> > > > > > +	srcu_lock_sync(&ssp->dep_map);
> > > > > > +
> > > > > >  	RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > > > >  			 lock_is_held(&rcu_bh_lock_map) ||
> > > > > >  			 lock_is_held(&rcu_lock_map) ||
> > > > > > -- 
> > > > > > 2.38.1
> > > > > 
> > > > > The following deadlock is able to escape srcu_lock_sync() because the
> > > > > __lock_release folded in sync leaves one lock on the sync side.
> > > > > 
> > > > > 	cpu9		cpu0
> > > > > 	---		---
> > > > > 	lock A		srcu_lock_acquire(&ssp->dep_map);
> > > > > 	srcu_lock_sync(&ssp->dep_map);
> > > > > 			lock A
> > > > 
> > > > But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
> > > > of lockdep selftest shows we can catch it. Anything subtle I'm missing?
> > > 
> > > I am leaning to not call it ABBA deadlock, because B is unlocked.
> > > 
> > > 	task X		task Y
> > > 	---		---
> > > 	lock A
> > > 	lock B
> > > 			lock B
> > > 	unlock B
> > > 	wait_for_completion E
> > > 
> > > 			lock A
> > > 			complete E
> > > 
> > > And no deadlock should be detected/caught after B goes home.
> > 
> > Your example makes me more confused.. given the case:
> > 
> > 	task X		task Y
> > 	---		---
> > 	mutex_lock(A);
> > 			srcu_read_lock(B);
> > 	synchronze_srcu(B);
> > 			mutex_lock(A);
> > 
> > isn't it a deadlock?
> 
> Yes and nope, see below.
> 
> > If your example, A, B or E which one is srcu?
> 
> A and B are mutex, and E is completion in my example to show the failure
> of catching deadlock in case of non-fake lock. Now see srcu after your change.
> 
>  	task X			task Y
>  	---			---
>  	mutex_lock(A);
>  				srcu_read_lock(B);
> 				srcu_lock_acquire(&B->dep_map);
> 				a) lock_map_acquire_read(&B->dep_map);
>  	synchronze_srcu(B);
> 	__synchronize_srcu(B);
> 	srcu_lock_sync(&B->dep_map);
> 	lock_map_sync(&B->dep_map);
> 	lock_sync(&B->dep_map);
> 	__lock_acquire(&B->dep_map);

At this time, lockdep add dependency A -> B in the dependency graph.

> 				b) lock_map_acquire_read(&B->dep_map);
> 	__lock_release(&B->dep_map);
> 				c) lock_map_acquire_read(&B->dep_map);
>  				mutex_lock(A);

and here, lockdep will try to add dependency B -> A into the dependency
graph, and find that A -> B -> A will form a circle (with strong
dependency), therefore lockdep knows it's a deadlock.

>  
> No deadlock could be detected if taskY takes mutexA after taskX releases B,

The timing that taskX releases B doesn't master, since lockdep uses
graph to detect deadlocks rather than after-fact detection.

> and how taskY acquires B does not matter as per the a), b) and c) modes in
> the above chart, again because releasing lock can break deadlock in general.

I have test cases showing the above deadlock can be detected, so if you
think there is a deadlock that may dodge from my change, feel free to
add a test case in lib/locking-selftest.c ;-)

Regards,
Boqun