On Thu, 8 Jan 2009, Linus Torvalds wrote: > > And I don't even believe that is the bug. I suspect the bug is simpler. > > I think the "need_resched()" needs to go in the outer loop, or at least > happen in the "!owner" case. Because at least with preemption, what can > happen otherwise is > > - process A gets the lock, but gets preempted before it sets lock->owner. > > End result: count = 0, owner = NULL. > > - processes B/C goes into the spin loop, filling up all CPU's (assuming > dual-core here), and will now both loop forever if they hold the kernel > lock (or have some other preemption disabling thing over their down()). > > And all the while, process A would _happily_ set ->owner, and eventually > release the mutex, but it never gets to run to do either of them so. > > In fact, you might not even need a process C: all you need is for B to be > on the same runqueue as A, and having enough load on the other CPU's that > A never gets migrated away. So "C" might be in user space. > > I dunno. There are probably variations on the above. Ouch! I think you are on to something: for (;;) { struct thread_info *owner; old_val = atomic_cmpxchg(&lock->count, 1, 0); if (old_val == 1) { lock_acquired(&lock->dep_map, ip); mutex_set_owner(lock); return 0; } if (old_val < 0 && !list_empty(&lock->wait_list)) break; /* See who owns it, and spin on him if anybody */ owner = ACCESS_ONCE(lock->owner); The owner was preempted before assigning lock->owner (as you stated). if (owner && !spin_on_owner(lock, owner)) break; We just spin :-( I think adding the: + if (need_resched()) + break; would solve the problem. Thanks, -- Steve cpu_relax(); } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html