Hi, Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes: > f_mass_storage has a memorry barrier issue with the sleep and wake > functions that can cause a deadlock. This results in intermittent hangs > during MSC file transfer. The host will reset the device after receiving > no response to resume the transfer. This issue is seen when dwc3 is > processing 2 transfer-in-progress events at the same time, invoking > completion handlers for CSW and CBW. Also this issue occurs depending on > the system timing and latency. > > To increase the chance to hit this issue, you can force dwc3 driver to > wait and process those 2 events at once by adding a small delay (~100us) > in dwc3_check_event_buf() whenever the request is for CSW and read the > event count again. Avoid debugging with printk and ftrace as extra > delays and memory barrier will mask this issue. > > Scenario which can lead to failure: > ----------------------------------- > 1) The main thread sleeps and waits for the next command in > get_next_command(). > 2) bulk_in_complete() wakes up main thread for CSW. > 3) bulk_out_complete() tries to wake up the running main thread for CBW. > 4) thread_wakeup_needed is not loaded with correct value in > sleep_thread(). > 5) Main thread goes to sleep again. > > The pattern is shown below. Note the 2 critical variables. > * common->thread_wakeup_needed > * bh->state > > CPU 0 (sleep_thread) CPU 1 (wakeup_thread) > ============================== =============================== > > bh->state = BH_STATE_FULL; > smp_wmb(); > thread_wakeup_needed = 0; thread_wakeup_needed = 1; > smp_rmb(); > if (bh->state != BH_STATE_FULL) > sleep again ... > > As pointed out by Alan Stern, this is an R-pattern issue. The issue can > be seen when there are two wakeups in quick succession. The > thread_wakeup_needed can be overwritten in sleep_thread, and the read of > the bh->state maybe reordered before the write to thread_wakeup_needed. > > This patch applies full memory barrier smp_mb() in both sleep_thread() > and wakeup_thread() to ensure the order which the thread_wakeup_needed > and bh->state are written and loaded. > > However, a better solution in the future would be to use wait_queue > method that takes care of managing memory barrier between waker and > waiter. > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx> Alan, just to make sure you're okay with this patch. Can I have your acked-by? -- balbi
Attachment:
signature.asc
Description: PGP signature