f_mass_storage has a memorry barrier issue with the sleep and wake functions that can cause a deadlock. This results in intermittent hangs during MSC file transfer. The host will reset the device after receiving no response to resume the transfer. This issue is seen when dwc3 is processing 2 transfer-in-progress events at the same time, invoking completion handlers for CSW and CBW. Also this issue occurs depending on the system timing and latency. To increase the chance to hit this issue, you can force dwc3 driver to wait and process those 2 events at once by adding a small delay (~100us) in dwc3_check_event_buf() whenever the request is for CSW and read the event count again. Avoid debugging with printk and ftrace as extra delays and memory barrier will mask this issue. Scenario which can lead to failure: ----------------------------------- 1) The main thread sleeps and waits for the next command. 2) bulk_in_complete() wakes up main thread for CSW. 3) bulk_out_complete() tries to wake up the running main thread for CBW. 4) thread_wakeup_needed is not loaded with correct value in sleep_thread(). 5) Main thread goes to sleep. See more detail below: Note the 2 critical variables. * common->thread_wakeup_needed * bh->state Eval'd | CPU0 (main thread) | CPU1 --------+------------------------------------------+---------------------- | get_next_command() | | ... | while(1)| while (bh->state != BUF_STATE_FULL) { | | for (;;) { | | set_current_state(TASK_INTERRUPTIBLE);| if(0) | if (thread_wakeup_needed) | | schedule() | | | bulk_in_complete() { | | smp_wmb(); | | bh->state = BUF_STATE_EMPTY | | smp_wmb(); | | thread_wakeup_needed = 1 | | wake_up_process() | | } | /* 2nd for loop iteration */ | | for (;;) { | | set_current_state(TASK_INTERRUPTIBLE);| if(1) | if (thread_wakeup_needed) | | break | | } | | | bulk_out_complete() { | | smp_wmb(); | __set_current_state(TASK_RUNNING); | | thread_wakeup_needed = 0; | | smp_rmb(); | | | | /* 2nd while loop iteration */ | while(1)| while (bh->state != BUF_STATE_FULL) { | | | bh->state = BUF_STATE_FULL; | | smp_wmb(); | -->| thread_wakeup_needed = 1 | | wake_up_process() | | } | for (;;) { | | set_current_state(TASK_INTERRUPTIBLE);| if(0) | if (thread_wakeup_needed) |<-- | schedule() | | | | <THREAD GOES TO SLEEP AND WAITS> | thread_wakeup_needed is not guaranteed to be loaded before checking it. Note that this happens when wake_up_process() is called on a running process. This patch fixes this issue by explicitly use smp_mb() after clearing of thread_wakeup_needed in the sleep function. It serializes the order in which thread_wakeup_needed is loaded and stored, thus prevent loading the wrong value when checking it in the sleeper. However, a better solution in the future would be to use wait_queue method that takes care of managing memory barrier between waker and waiter. See DEFINE_WAIT_FUNC comment in kernel scheduling wait.c as this solution is similar to its implementation. Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx> --- drivers/usb/gadget/function/f_mass_storage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c index 4c8aacc232c0..f40433eb8259 100644 --- a/drivers/usb/gadget/function/f_mass_storage.c +++ b/drivers/usb/gadget/function/f_mass_storage.c @@ -627,7 +627,7 @@ static int sleep_thread(struct fsg_common *common, bool can_freeze) } __set_current_state(TASK_RUNNING); common->thread_wakeup_needed = 0; - smp_rmb(); /* ensure the latest bh->state is visible */ + smp_mb(); /* ensure the latest bh->state is visible */ return rc; } -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html