On Fri, 2 Sep 2016, Peter Zijlstra wrote: > On Fri, Sep 02, 2016 at 02:10:13PM -0400, Alan Stern wrote: > > Paul, Peter, and Ingo: > > > > This must have come up before, but I don't know what was decided. > > > > Isn't it often true that a memory barrier is needed before a call to > > wake_up_process()? A typical scenario might look like this: > > > > CPU 0 > > ----- > > for (;;) { > > set_current_state(TASK_INTERRUPTIBLE); > > if (signal_pending(current)) > > break; > > if (wakeup_flag) > > break; > > schedule(); > > } > > __set_current_state(TASK_RUNNING); > > wakeup_flag = 0; > > > > > > CPU 1 > > ----- > > wakeup_flag = 1; > > wake_up_process(my_task); > > > > The underlying pattern is: > > > > CPU 0 CPU 1 > > ----- ----- > > write current->state write wakeup_flag > > smp_mb(); > > read wakeup_flag read my_task->state > > > > where set_current_state() does the write to current->state and > > automatically adds the smp_mb(), and wake_up_process() reads > > my_task->state to see whether the task needs to be woken up. > > > > The kerneldoc for wake_up_process() says that it has no implied memory > > barrier if it doesn't actually wake anything up. And even when it > > does, the implied barrier is only smp_wmb, not smp_mb. > > > > This is the so-called SB (Store Buffer) pattern, which is well known to > > require a full smp_mb on both sides. Since wake_up_process() doesn't > > include smp_mb(), isn't it correct that the caller must add it > > explicitly? > > > > In other words, shouldn't the code for CPU 1 really be: > > > > wakeup_flag = 1; > > smp_mb(); > > wake_up_process(task); > > > > No, it doesn't need to do that. try_to_wake_up() does the right thing. > > It does: > > smp_mb__before_spinlock(); > raw_spin_lock_irqsave(&p->pi_lock); > > Now, smp_mb__before_spinlock() is a bit of an odd duck, if you look at > its comment it says: > > /* > * Despite its name it doesn't necessarily has to be a full barrier. > * It should only guarantee that a STORE before the critical section > * can not be reordered with LOADs and STOREs inside this section. > * spin_lock() is the one-way barrier, this LOAD can not escape out > * of the region. So the default implementation simply ensures that > * a STORE can not move into the critical section, smp_wmb() should > * serialize it with another STORE done by spin_lock(). > */ > #ifndef smp_mb__before_spinlock > #define smp_mb__before_spinlock() smp_wmb() > #endif I see. So the kerneldoc for wake_up_process() is misleading at best. > So per default it ends up being: > > WMB > LOCK > > Which is sufficient to order the prior store vs the later load as is > required. Note that a spinlock acquire _must_ imply a store (we need to > mark the lock as taken), therefore the prior store is ordered against > the lock store per the wmb, and since the lock must imply an ACQUIRE > that limits the load. Actually, that's not entirely true (although presumably it works okay for most architectures). In theory, the first write and the WMB can move down after the lock's ACQUIRE. Then the later read could move earlier, before the lock's store, the WMB, and the first write, but still after the ACQUIRE. Thus the later read could be reordered with the first write. In other words, these actions: store wakeup_flag WMB load-ACQUIRE lock store lock read task->state can be reordered to: load-ACQUIRE lock store wakeup_flag WMB store lock read task->state and then to: load-ACQUIRE lock read task->state store wakeup_flag WMB store lock without violating any ordering rules. > Now, PowerPC defines smp_mb__before_spinlock as smp_mb(), and this is > because PowerPC ACQUIRE is a bit of an exception, if you want more > details I'm sure I or Paul can dredge them up :-) PPC can't do the reordering described above, but it does have other weaknesses. Without smp_mb(), the write to wakeup_flag doesn't have to propagate from one CPU to the other before the second CPU tries to read it. Felipe, your tests will show whether my guess was totally off-base. For the new people, Felipe is tracking down a problem that involves exactly the code sequence listed at the top of the email, where we know that the wakeup routine runs but nevertheless the task sleeps. At least, that's what it looks like at the moment. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html