On 05/13, Paul E. McKenney wrote: > > On Tue, May 13, 2014 at 08:57:42PM +0200, Oleg Nesterov wrote: > > On 05/13, Paul E. McKenney wrote: > > > > > > On Tue, May 13, 2014 at 05:44:35PM +0200, Peter Zijlstra wrote: > > > > > > > > Ah, yes, so I'll defer to Oleg and Linus to explain that one. As per the > > > > name: smp_mb__before_spinlock() should of course imply a full barrier. > > > > > > How about if I queue a name change to smp_wmb__before_spinlock()? > > > > I agree, this is more accurate, simply because it describes what it > > actually does. > > > > But just in case, as for try_to_wake_up() it does not actually need > > wmb() between "CONDITION = T" and "task->state = RUNNING". It would > > be fine if these 2 STORE's are re-ordered, we can rely on rq->lock. > > > > What it actually needs is a barrier between "CONDITION = T" and > > "task->state & state" check. But since we do not have a store-load > > barrier, wmb() was added to ensure that "CONDITION = T" can't leak > > into the critical section. > > > > But it seems that set_tlb_flush_pending() already assumes that it > > acts as wmb(), so probably smp_wmb__before_spinlock() is fine. > > Except that when I go to make the change, I find the following in > the documentation: > > Memory operations issued before the ACQUIRE may be completed after > the ACQUIRE operation has completed. An smp_mb__before_spinlock(), > combined with a following ACQUIRE, orders prior loads against > subsequent loads and stores and also orders prior stores against > subsequent stores. Note that this is weaker than smp_mb()! The > smp_mb__before_spinlock() primitive is free on many architectures. > > Which means that either the documentation is wrong or the implementation > is. Yes, smp_wmb() has the semantics called out above on many platforms, > but not on Alpha or ARM. Well, I think the documentation is wrong in any case. "prior loads against subsequent loads" is not true. And it doesn't document that the initial goal was "prior stores against the subsequent loads". "prior stores against the subsequent stores" is obviously true for the default implementation, but this is the "side effect" because it uses wmb(). The only intent of wmb() added by 04e2f174 "Add memory barrier semantics to wake_up() & co" (afaics at least) was: make sure that ttwu() does not read p->state before the preceding stores are completed. e0acd0a68e "sched: fix the theoretical signal_wake_up() vs schedule() race" added the new helper for documentation, to explain that the default implementation abuses wmb() to achieve the serialization above. > So, as you say, set_tlb_flush_pending() only relies on smp_wmb(). The comment says ;) and this means that even if we suddenly have a new load_store() barrier (which could work for ttwu/schedule) we can no longer change smp_mb__before_spinlock() to use it. > The comment in try_to_wake_up() seems to be assuming a full memory > barrier. The comment in __schedule() also seems to be relying on > a full memory barrier (prior write against subsequent read). Yow! Well yes, but see above. Again, we need load_store() before reading p->state, which we do not have. wmb() before spin_lock() can be used instead. But, try_to_wake_up() and __schedule() do not need a full barrier in a sense that if we are going to wake this task up (or just clear its ->state), then "CONDITION = T" can be delayed till spin_unlock(). We do not care if that tasks misses CONDITION in this case, it will call schedule() which will take the same lock. But if we are not going to wake it up, we need to ensure that the task can't miss CONDITION. IOW, this all is simply about CONDITION = T; current->state = TASK_XXX; mb(); if (p->state) if (!CONDITION) wake_it_up(); schedule(); race. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html