On Mon, 10 Apr 2017, Felipe Balbi wrote: > Hi, > > Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes: > > f_mass_storage has a memorry barrier issue with the sleep and wake > > functions that can cause a deadlock. This results in intermittent hangs > > during MSC file transfer. The host will reset the device after receiving > > no response to resume the transfer. This issue is seen when dwc3 is > > processing 2 transfer-in-progress events at the same time, invoking > > completion handlers for CSW and CBW. Also this issue occurs depending on > > the system timing and latency. > > > > To increase the chance to hit this issue, you can force dwc3 driver to > > wait and process those 2 events at once by adding a small delay (~100us) > > in dwc3_check_event_buf() whenever the request is for CSW and read the > > event count again. Avoid debugging with printk and ftrace as extra > > delays and memory barrier will mask this issue. > > > > Scenario which can lead to failure: > > ----------------------------------- > > 1) The main thread sleeps and waits for the next command. > > 2) bulk_in_complete() wakes up main thread for CSW. > > 3) bulk_out_complete() tries to wake up the running main thread for CBW. > > 4) thread_wakeup_needed is not loaded with correct value in sleep_thread(). > > 5) Main thread goes to sleep. > > > > See more detail below: > > > > Note the 2 critical variables. > > * common->thread_wakeup_needed > > * bh->state > > > > Eval'd | CPU0 (main thread) | CPU1 > > --------+------------------------------------------+---------------------- > > | get_next_command() | > > | ... | > > while(1)| while (bh->state != BUF_STATE_FULL) { | > > | for (;;) { | > > | set_current_state(TASK_INTERRUPTIBLE);| > > if(0) | if (thread_wakeup_needed) | > > | schedule() | > > | | bulk_in_complete() { > > | | smp_wmb(); > > | | bh->state = BUF_STATE_EMPTY > > | | smp_wmb(); > > | | thread_wakeup_needed = 1 > > | | wake_up_process() > > | | } > > | /* 2nd for loop iteration */ | > > | for (;;) { | > > | set_current_state(TASK_INTERRUPTIBLE);| > > if(1) | if (thread_wakeup_needed) | > > | break | > > | } | > > | | bulk_out_complete() { > > | | smp_wmb(); > > | __set_current_state(TASK_RUNNING); | > > | thread_wakeup_needed = 0; | > > | smp_rmb(); | > > | | > > | /* 2nd while loop iteration */ | > > while(1)| while (bh->state != BUF_STATE_FULL) { | > > | | bh->state = BUF_STATE_FULL; > > | | smp_wmb(); > > | -->| thread_wakeup_needed = 1 > > | | wake_up_process() > > | | } > > | for (;;) { | > > | set_current_state(TASK_INTERRUPTIBLE);| > > if(0) | if (thread_wakeup_needed) |<-- > > | schedule() | > > | | > > | <THREAD GOES TO SLEEP AND WAITS> | > > > > thread_wakeup_needed is not guaranteed to be loaded before checking it. > > Note that this happens when wake_up_process() is called on a running > > process. > > > > This patch fixes this issue by explicitly use smp_mb() after clearing of > > thread_wakeup_needed in the sleep function. It serializes the order in > > which thread_wakeup_needed is loaded and stored, thus prevent loading > > the wrong value when checking it in the sleeper. > > > > However, a better solution in the future would be to use wait_queue > > method that takes care of managing memory barrier between waker and > > waiter. > > > > See DEFINE_WAIT_FUNC comment in kernel scheduling wait.c as this > > solution is similar to its implementation. > > > > Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx> > > --- > > drivers/usb/gadget/function/f_mass_storage.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c > > index 4c8aacc232c0..f40433eb8259 100644 > > --- a/drivers/usb/gadget/function/f_mass_storage.c > > +++ b/drivers/usb/gadget/function/f_mass_storage.c > > @@ -627,7 +627,7 @@ static int sleep_thread(struct fsg_common *common, bool can_freeze) > > } > > __set_current_state(TASK_RUNNING); > > common->thread_wakeup_needed = 0; > > - smp_rmb(); /* ensure the latest bh->state is visible */ > > + smp_mb(); /* ensure the latest bh->state is visible */ > > return rc; > > } > > > > -- > > 2.11.0 > > Alan, do you have any comments here? Do we really need a full barrier in > this situation? Peter Z has pointed out in the past that the locking and barrier mechanisms in f_mass_storage are not well thought out. He also suggested using a wait_queue rather than doing some home-made sleep/wake calculations. I agree that this would simplify the code. However, in this case I do not believe the bug described above will be fixed by changing the smp_rmb() to smp_mb() or by going to a wait_queue. Here's my reasoning; maybe I'm wrong... The important part of this race can be written schematically like so: CPU 0 CPU 1 A: thread_wakeup_needed = 0; smp_rmb(); ... B: thread_wakeup_needed = 1; C: wake_up_process(); D: set_current_state(TASK_INTERRUPTIBLE); E: if (thread_wakeup_needed) break; schedule(); Now, set_current_state() uses smp_store_mb(), so D is equivalent to D: current->state = TASK_INTERRUPTIBLE; smp_mb(); Furthermore, wake_up_process(task) must read task->state in order to see whether task needs to be woken up. HOWEVER, the kerneldoc for wake_up_process() says: * It may be assumed that this function implies a write memory barrier before * changing the task state if and only if any tasks are woken up. (Side point: This says _write_ memory barrier, not _full_ memory barrier, which is what is needed here.) So if CPU 0 was not yet in TASK_INTERRUPTIBLE when CPU 1 ran wake_up_process(), then CPU 1 would not execute any barriers and we would have: CPU 0 CPU 1 D: Write task->state B: Write thread_wakeup_needed smp_mb() C: Read task->state E: Read thread_wakeup_needed with no memory barrier on CPU 1. In memory model circles this is known as the SB (Store Buffering) pattern, and without a full barrier on both sides it is entirely possible for both reads to miss seeing the other CPU's write. That's what causes CPU 0 to call schedule() with nobody to wake it up again: CPU 0 sees that thread_wakeup_needed is still 0, and CPU 1 sees that task->state is still TASK_RUNNING. Please note that this analysis remains unchanged if the smp_rmb() after A is changed to smp_mb(). The problem is not related to the assignment at A; it is related to the assignment at B. We need to have a memory barrier between B and C regardless of whether the task was woken up. In other words, I believe the correct fix is to add smp_mb() in f_mass_storage's wakeup_thread() routine, just after thread_wakeup_needed is set to 1. Before today, I had believed that wake_up_process() implicitly performed this memory barrier in all cases. But the kerneldoc says it doesn't, and unless Peter tells us that the kerneldoc is wrong, I see no alternative to adding the memory barrier explicitly. Incidentally, going to a wait_queue implementation wouldn't help. With wait_queues, the pattern for CPU 1 would be: thread_wakeup_needed = 1; wake_up(&wq); But wake_up() is implemented as a call to __wake_up(), and the kerneldoc for __wake_up() says: * It may be assumed that this function implies a write memory barrier before * changing the task state if and only if any tasks are woken up. Just as in wake_up_process() -- and again, what we need is an unconditional full barrier, not a conditional write barrier. See also the kerneldoc for waitqueue_active() in include/linux/wait.h; it states that an explicit smp_mb() is required on the waker. It's hard to believe this sort of problem hasn't bitten other people in the past. Is there a better solution than to add a memory barrier every place where wake_up() (or one of its siblings) is called? Alan Stern P.S.: This kind of solution is okay for right away and something to put in -stable. For the longer term, however, it has hardened my resolve to completely rewrite the locking and barriers in f_mass_storage. But I would like to get this matter settled first. Is the explicit barrier truly necessary? -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html