Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

"Brad Mouring" <bmouring@xxxxxx> · Wed, 4 Jun 2014 10:11:29 -0500

On Wed, Jun 04, 2014 at 10:58:23AM -0400, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 09:38:30 -0500
> "Brad Mouring" <bmouring@xxxxxx> wrote:
> 
> > On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> > > On Wed, 4 Jun 2014 08:05:25 -0500
> > > "Brad Mouring" <bmouring@xxxxxx> wrote:
> > > 
> > >  >          A->L2
> > > > 
> > > > This is a slight variation on what I was seeing. To use the nomenclature
> > > > that you proposed at the start, rewinding to the point
> > > > 
> > > >    A->L2->B->L3->C->L4->D
> > > > 
> > > > Let's assume things continue to unfold as you explain. Task is D,
> > > > top_waiter is C. A is scheduled out and the chain shuffles.
> > > > 
> > > >        A->L2->B
> > > > C->L4->D->'
> > > 
> > > But isn't that a lock ordering problem there?
> > > 
> > > If B can block on L3 owned by C, I see the following:
> > > 
> > >   B->L3->C->L4->D->L2->B
> > > 
> > > Deadlock!
> > Yes, it could be. But currently no one owns L3. B is currently not
> > blocked. Under these circumstances, there is no deadlock. Also, I
> > somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
> > since the process is
> 
> OK, then you should have used L1, which basically makes it exactly my
> scenario ;-)

Heh, fair point.

> 
> > ...
> > waiter = D->pi_blocked_on
> > 
> > // waiter is real_waiter D->L2
> > 
> > // orig_waiter still there, orig_lock still has an owner
> > 
> > // top_waiter was pointing to C->L4, now points to C->Lfoo
> > // D does have top_waiters, and, as noted above, it aliased
> > // to encompass a different waiter scenario
> > 
> > > 
> > > In my scenario I was very careful to point out that the lock ordering
> > > was: L1->L2->L3->L4
> > > 
> > > But you show that we can have both:
> > > 
> > >    L2-> ... ->L4
> > > 
> > >     and
> > > 
> > >    L4-> ... ->L2
> > > 
> > > Which is a reverse of lock ordering and a possible deadlock can occur.
> > 
> > So the numbering/ordering of the locks is really somewhat arbitrary.
> > Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
> > could just as easily block on L8), and we absolutely have
> > L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
> > I dug through, no actual deadlocks occurred.
> 
> Heh, but that shows the code is broken. I'm not saying that our
> deadlock detector is not returning false positives, I'm just stating
> that you probably need to fix your code.
> 
> Yes, you can have a locking order of L1 -> L2 and also L2 -> L1, and if
> you are lucky, that may never trigger any deadlocks. But why do you
> think the kernel folks have put so much effort into lockdep. Lockdep
> doesn't tell you that there is a deadlock (although it could), what it
> is so useful with is to tell us where there are possible deadlocks.
> 
> If your code does take L1 -> L2 and then L2 -> L1, you have a chance of
> hitting a deadlock right there. If you were to run the userspace
> lockdep, it would spit out a nice warning for you.

What I was saying is that the code can take L1 -> L2 and L2 -> Lfoo.
And, in fact, a quick glance back over my notes supports just this
behavior. It was unfortunate that I decided to come up with an example
without thinking it through first.
>
> But this is off topic, as I have shown that there exists an example
> that the userspace code would never deadlock but our deadlock detector
> would say it did.
> 
> -- Steve
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html