On Wed, Jun 04, 2014 at 10:58:23AM -0400, Steven Rostedt wrote: > On Wed, 4 Jun 2014 09:38:30 -0500 > "Brad Mouring" <bmouring@xxxxxx> wrote: > > > On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote: > > > On Wed, 4 Jun 2014 08:05:25 -0500 > > > "Brad Mouring" <bmouring@xxxxxx> wrote: > > > > > > > A->L2 > > > > > > > > This is a slight variation on what I was seeing. To use the nomenclature > > > > that you proposed at the start, rewinding to the point > > > > > > > > A->L2->B->L3->C->L4->D > > > > > > > > Let's assume things continue to unfold as you explain. Task is D, > > > > top_waiter is C. A is scheduled out and the chain shuffles. > > > > > > > > A->L2->B > > > > C->L4->D->' > > > > > > But isn't that a lock ordering problem there? > > > > > > If B can block on L3 owned by C, I see the following: > > > > > > B->L3->C->L4->D->L2->B > > > > > > Deadlock! > > Yes, it could be. But currently no one owns L3. B is currently not > > blocked. Under these circumstances, there is no deadlock. Also, I > > somewhat arbitrarily picked L4, it could be Lfoo that C blocks on > > since the process is > > OK, then you should have used L1, which basically makes it exactly my > scenario ;-) Heh, fair point. > > > ... > > waiter = D->pi_blocked_on > > > > // waiter is real_waiter D->L2 > > > > // orig_waiter still there, orig_lock still has an owner > > > > // top_waiter was pointing to C->L4, now points to C->Lfoo > > // D does have top_waiters, and, as noted above, it aliased > > // to encompass a different waiter scenario > > > > > > > > In my scenario I was very careful to point out that the lock ordering > > > was: L1->L2->L3->L4 > > > > > > But you show that we can have both: > > > > > > L2-> ... ->L4 > > > > > > and > > > > > > L4-> ... ->L2 > > > > > > Which is a reverse of lock ordering and a possible deadlock can occur. > > > > So the numbering/ordering of the locks is really somewhat arbitrary. > > Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it > > could just as easily block on L8), and we absolutely have > > L4-> ... ->L2. A deadlock *could* occur, but all of the traces that > > I dug through, no actual deadlocks occurred. > > Heh, but that shows the code is broken. I'm not saying that our > deadlock detector is not returning false positives, I'm just stating > that you probably need to fix your code. > > Yes, you can have a locking order of L1 -> L2 and also L2 -> L1, and if > you are lucky, that may never trigger any deadlocks. But why do you > think the kernel folks have put so much effort into lockdep. Lockdep > doesn't tell you that there is a deadlock (although it could), what it > is so useful with is to tell us where there are possible deadlocks. > > If your code does take L1 -> L2 and then L2 -> L1, you have a chance of > hitting a deadlock right there. If you were to run the userspace > lockdep, it would spit out a nice warning for you. What I was saying is that the code can take L1 -> L2 and L2 -> Lfoo. And, in fact, a quick glance back over my notes supports just this behavior. It was unfortunate that I decided to come up with an example without thinking it through first. > > But this is off topic, as I have shown that there exists an example > that the userspace code would never deadlock but our deadlock detector > would say it did. > > -- Steve > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html