On Mon, Apr 10, 2017 at 03:00:34PM -0400, Alan Stern wrote: > On Mon, 10 Apr 2017, Paul E. McKenney wrote: > > > > > The ordering with current->state is sadly not relevant because it is > > > > only touched if wake_up() actually wakes the process up. > > > > > > Well, it is _written_ only if wake_up() actually wakes the process up. > > > But it is _read_ in every case. > > > > For wake_up_process(), agreed. But for wake_up(), if the process > > doing the wait_event() saw the changed state on the very first check, > > the waking process won't have any way of gaining a reference to the > > "awakened" task, so cannot access its ->state. > > True. But that would be okay, since the waiter has definitely seen the > changed state. I was concerned about the possibility that there was no > wakeup and the waiter did _not_ see the changed state. That's how you > get tasks staying asleep indefinitely when they should be running. > > > > It looks like the other wakeup pathways end up funnelling through > > > try_to_wake_up(), so this is true in general. > > > > Only for wake_up_process() and friends, not for wake_up() and friends. > > Again, although wake_up_process() unconditionally checks the awakened > > processm, wake_up() doesn't even have any way of knowing what process > > it woke up in the case where the "awakened" process didn't actually sleep. > > Like the above, this would be okay. > > > But even in the wake_up_process() case, don't we need the wait-side > > process to have appropriate barriers (or locks or whatever) manually > > inserted? > > Only for accesses among the driver's own variables. There's no need to > order the local variables against current->state; as we have seen, > that's all handled for us. Fair enough. However, the kernel documentation needs to consider the driver's own variables. > > > > > This also means that the analysis provided by Thinh Nguyen in the > > > > > original patch description is wrong. > > > > > > > > And that the bug is elsewhere? > > > > > > Presumably. On the other hand, Thinh Nguyen claimed to have narrowed > > > the problem down to this particular mechanism. The driver in question > > > in drivers/usb/gadget/function/f_mass_storage.c. The waker routines > > > are bulk_out_complete()/wakeup_thread(), which do: > > > > > > // bulk_out_complete() > > > bh->state = BH_STATE_FULL; > > > > > > // wakeup_thread() > > > smp_wmb(); /* ensure the write of bh->state is complete */ > > > /* Tell the main thread that something has happened */ > > > common->thread_wakeup_needed = 1; > > > if (common->thread_task) > > > wake_up_process(common->thread_task); > > > > > > and the waiters are get_next_command()/sleep_thread(), which do: > > > > > > // get_next_command() > > > while (bh->state == BH_STATE_EMPTY) { > > > > > > // sleep_thread() > > > for (;;) { > > > if (can_freeze) > > > try_to_freeze(); > > > set_current_state(TASK_INTERRUPTIBLE); > > > if (signal_pending(current)) { > > > rc = -EINTR; > > > break; > > > } > > > if (common->thread_wakeup_needed) > > > break; > > > schedule(); > > > } > > > __set_current_state(TASK_RUNNING); > > > common->thread_wakeup_needed = 0; > > > smp_rmb(); /* ensure the latest bh->state is visible */ > > > > > > } > > > > > > and he said that the problem was caused by the waiter seeing > > > thread_wakeup_needed == 0, so the wakeup was getting lost. > > > > > > Hmmm, I suppose it's possible that the waker's thread_wakeup_needed = 1 > > > could race with the waiter's thread_wakeup_needed = 0. If there are > > > two waits in quick succession, the second could get lost. The pattern > > > would be: > > > > > > bh->state = BH_STATE_FULL; > > > smp_wmb(); > > > thread_wakeup_needed = 0; thread_wakeup_needed = 1; > > > smp_rmb(); > > > if (bh->state != BH_STATE_FULL) > > > sleep again... > > > > > > This is the so-called R pattern, and it also needs full memory barriers > > > on both sides. The barriers we have are not sufficient. (This is an > > > indication that the driver's design needs to be re-thought.) As it is, > > > the waiter's thread_wakeup_needed = 0 can overwrite the waker's > > > thread_wakeup_needed = 1 while the waiter's read of bh->state then > > > fails to see the waker's write. (This analysis is similar to but > > > different from the one in the patch description.) > > > > > > To fix this problem, both the smp_rmb() in sleep_thread() and the > > > smp_wmb() in wakeup_thread() should be changed to smp_mb(). > > > > Good catch! > > > > However, if this failure was seen on x86, there is something else going > > on as well. > > Why do you say that? Isn't R-pattern reordering observable on x86? > It involves a write followed by a read, which is the sort of thing x86 > is able to reorder. You are right -- the smp_mb() is still needed on the write-then-read side, and either a barrier(), an smp_wmb(), or an smp_store_release() suffices for the pair-of-writes side. > Anyway, I don't know what type of system was used for testing. The > patch description didn't say. > > Alan Stern > -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html