On Tue, 6 Jan 2009, Linus Torvalds wrote: > > Sure - the owner may have rescheduled to another CPU, but if it did that, > then we really might as well sleep. .. or at least re-try the loop. It might be worth it to move the whole sleeping behavior into that helper function (mutex_spin_or_sleep()), and just make the logic be: - if we _enter_ the function with the owner not on a runqueue, then we sleep - otherwise, we spin as long as the owner stays on the same runqueue. and then we loop over the whole "get mutex spinlock and revalidate it all" after either sleeping or spinning for a while. That seems much simpler in all respects. And it should simplify the whole patch too, because it literally means that in the main __mutex_lock_common(), the only real difference is - __set_task_state(task, state); - spin_unlock_mutex(&lock->wait_lock, flags); - schedule(); becoming instead + mutex_spin_or_schedule(owner, task, lock, state); and then all the "spin-or-schedule" logic is in just that one simple routine (that has to look up the rq and drop the lock and re-take it). The "mutex_spin_or_schedule()" code would literally look something like void mutex_spin_or_schedule(owner, task, lock, state) { struct rq *owner_rq = owner->rq; if (owner_rq->curr != owner) { __set_task_state(task, state); spin_unlock_mutex(&lock->wait_lock, flags); schedule(); } else { spin_unlock_mutex(&lock->wait_lock, flags); do { cpu_relax(); while (lock->owner == owner && owner_rq->curr == owner); } spin_lock_mutex(&lock->wait_lock, flags); } or something. Btw, this also fixes a bug: your patch did + __set_task_state(task, state); + /* didnt get the lock, go to sleep: */ + schedule(); for the schedule case without holding the mutex spinlock. And that seems very buggy and racy indeed: since it doesn't hold the mutex lock, if the old owner releases the mutex at just the right point (yeah, yeah, it requires a scheduling event on another CPU in order to also miss the whole "task_is_current()" logic), the wakeup can get lost, because you set the state to sleeping perhaps _after_ the task just got woken up. So we stay sleeping even though the mutex is clear. So I'm going to NAK the original patch, and just -require- the cleanup I'm suggesting as also fixing what looks like a bug. Of course, once I see the actual patch, maybe I'm going to find something _else_ to kwetch about. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html