On 01/29, Linus Torvalds wrote: > > But yes, if you're talking about TASK_UNINTERRUPTIBLE, we do need to > just remove the setting of that entirely. It needs to be set *before* > adding us to the list, not after. That's just a bug - we get woken up > when we've been given the lock. Yes, I think this should work although I am not familiar with this code. If we remove set_task_state() from the main waiting loop we can never race with __rwsem_do_wake()->try_to_wake_up() seeing us in UNINTERRUPTIBLE state. rwsem_down_failed_common() simply can't return until UNINTERRUPTIBLE->RUNNING transition is finished (__rwsem_do_wake does wakeup first). And since we do not play with current->state after spin_unlock(), it is fine to "race" with waiter->task clearing, just we can do the unnecessary but harmless schedule() in TASK_RUNNING. > So it may be completely and utterly broken for some subtle reason, Well, what about another spurious wakeup from somewhere? In this case rwsem_down_failed_common() will do a busy-wait loop. > @@ -92,10 +92,9 @@ __rwsem_do_wake(struct rw_semaphore *sem, int wake_type) > */ > list_del(&waiter->list); > tsk = waiter->task; > + wake_up_process(tsk); > smp_mb(); > waiter->task = NULL; OK, now I understand why do we need "clear after wakeup". But then I don't really understand this mb, perhaps wmb() is enough? Afaics we only need to ensure we change waiter->task after changing task's state. OTOH, > @@ -183,7 +181,6 @@ rwsem_down_failed_common(struct rw_semaphore *sem, > raw_spin_lock_irq(&sem->wait_lock); > waiter.task = tsk; > waiter.flags = flags; > - get_task_struct(tsk); > > if (list_empty(&sem->wait_list)) > adjustment += RWSEM_WAITING_BIAS; > @@ -211,11 +208,8 @@ rwsem_down_failed_common(struct rw_semaphore *sem, > if (!waiter.task) > break; > schedule(); > - set_task_state(tsk, TASK_UNINTERRUPTIBLE); > } > > - tsk->state = TASK_RUNNING; > - > return sem; > } Suppose that this task does tsk->state = TASK_WHATEVER after that. It seems that we need mb() before return, otherwise the next ->state change can be reordered with "if (!waiter.task)" above. Or not? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html