Re: [PATCH 3/3] usb: gadget: f_mass_storage: Serialize wake and sleep execution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 10, 2017 at 11:01:59AM -0400, Alan Stern wrote:
> On Mon, 10 Apr 2017, Felipe Balbi wrote:
> 
> > Hi,
> > 
> > Thinh Nguyen <Thinh.Nguyen@xxxxxxxxxxxx> writes:
> > > f_mass_storage has a memorry barrier issue with the sleep and wake
> > > functions that can cause a deadlock. This results in intermittent hangs
> > > during MSC file transfer. The host will reset the device after receiving
> > > no response to resume the transfer. This issue is seen when dwc3 is
> > > processing 2 transfer-in-progress events at the same time, invoking
> > > completion handlers for CSW and CBW. Also this issue occurs depending on
> > > the system timing and latency.
> > >
> > > To increase the chance to hit this issue, you can force dwc3 driver to
> > > wait and process those 2 events at once by adding a small delay (~100us)
> > > in dwc3_check_event_buf() whenever the request is for CSW and read the
> > > event count again. Avoid debugging with printk and ftrace as extra
> > > delays and memory barrier will mask this issue.
> 
> > >
> > > Scenario which can lead to failure:
> > > -----------------------------------
> > > 1) The main thread sleeps and waits for the next command.
> > > 2) bulk_in_complete() wakes up main thread for CSW.
> > > 3) bulk_out_complete() tries to wake up the running main thread for CBW.
> > > 4) thread_wakeup_needed is not loaded with correct value in sleep_thread().
> > > 5) Main thread goes to sleep.
> > >
> > > See more detail below:
> > >
> > > Note the 2 critical variables.
> > >  * common->thread_wakeup_needed
> > >  * bh->state
> > >
> > > Eval'd  | CPU0 (main thread)			   | CPU1
> > > --------+------------------------------------------+----------------------
> > >         | get_next_command()			   |
> > >         | ...					   |
> > > while(1)| while (bh->state != BUF_STATE_FULL) {	   |
> > >         |  for (;;) {				   |
> > >         |    set_current_state(TASK_INTERRUPTIBLE);|
> > > if(0)   |    if (thread_wakeup_needed)		   |
> > >         |    schedule()				   |
> > > 	|					   | bulk_in_complete() {
> > > 	|					   | smp_wmb();
> > > 	|					   | bh->state = BUF_STATE_EMPTY
> > > 	|					   | smp_wmb();
> > > 	|					   | thread_wakeup_needed = 1
> > > 	|					   | wake_up_process()
> > > 	|					   | }
> > > 	|  /* 2nd for loop iteration */		   |
> > >         |  for (;;) {				   |
> > >         |    set_current_state(TASK_INTERRUPTIBLE);|
> > > if(1)   |    if (thread_wakeup_needed)		   |
> > >         |      break				   |
> > >         |  }					   |
> > > 	|					   | bulk_out_complete() {
> > > 	|					   | smp_wmb();
> > > 	|  __set_current_state(TASK_RUNNING);	   |
> > > 	|  thread_wakeup_needed = 0;		   |
> > > 	|  smp_rmb();				   |
> > > 	|					   |
> > > 	|  /* 2nd while loop iteration */	   |
> > > while(1)| while (bh->state != BUF_STATE_FULL) {	   |
> > > 	|					   | bh->state = BUF_STATE_FULL;
> > > 	|					   | smp_wmb();
> > > 	|					-->| thread_wakeup_needed = 1
> > > 	|					   | wake_up_process()
> > > 	|					   | }
> > >         |  for (;;) {				   |
> > >         |    set_current_state(TASK_INTERRUPTIBLE);|
> > > if(0)   |    if (thread_wakeup_needed)		   |<--
> > >         |    schedule()				   |
> > >         |					   |
> > > 	|	<THREAD GOES TO SLEEP AND WAITS>   |
> > >
> > > thread_wakeup_needed is not guaranteed to be loaded before checking it.
> > > Note that this happens when wake_up_process() is called on a running
> > > process.
> > >
> > > This patch fixes this issue by explicitly use smp_mb() after clearing of
> > > thread_wakeup_needed in the sleep function. It serializes the order in
> > > which thread_wakeup_needed is loaded and stored, thus prevent loading
> > > the wrong value when checking it in the sleeper.
> > >
> > > However, a better solution in the future would be to use wait_queue
> > > method that takes care of managing memory barrier between waker and
> > > waiter.
> > >
> > > See DEFINE_WAIT_FUNC comment in kernel scheduling wait.c as this
> > > solution is similar to its implementation.
> > >
> > > Signed-off-by: Thinh Nguyen <thinhn@xxxxxxxxxxxx>
> > > ---
> > >  drivers/usb/gadget/function/f_mass_storage.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c
> > > index 4c8aacc232c0..f40433eb8259 100644
> > > --- a/drivers/usb/gadget/function/f_mass_storage.c
> > > +++ b/drivers/usb/gadget/function/f_mass_storage.c
> > > @@ -627,7 +627,7 @@ static int sleep_thread(struct fsg_common *common, bool can_freeze)
> > >  	}
> > >  	__set_current_state(TASK_RUNNING);
> > >  	common->thread_wakeup_needed = 0;
> > > -	smp_rmb();	/* ensure the latest bh->state is visible */
> > > +	smp_mb();	/* ensure the latest bh->state is visible */
> > >  	return rc;
> > >  }
> > >  
> > > -- 
> > > 2.11.0
> > 
> > Alan, do you have any comments here? Do we really need a full barrier in
> > this situation?
> 
> Peter Z has pointed out in the past that the locking and barrier
> mechanisms in f_mass_storage are not well thought out.  He also
> suggested using a wait_queue rather than doing some home-made
> sleep/wake calculations.  I agree that this would simplify the code.
> 
> However, in this case I do not believe the bug described above will be 
> fixed by changing the smp_rmb() to smp_mb() or by going to a 
> wait_queue.  Here's my reasoning; maybe I'm wrong...
> 
> The important part of this race can be written schematically like so:
> 
> 	CPU 0				CPU 1
> 
> A:	thread_wakeup_needed = 0;
> 	smp_rmb();
> 	...
> B:					thread_wakeup_needed = 1;
> C:					wake_up_process();
> 
> D:	set_current_state(TASK_INTERRUPTIBLE);
> E:	if (thread_wakeup_needed)
> 		break;
> 	schedule();
> 
> Now, set_current_state() uses smp_store_mb(), so D is equivalent to
> 
> D:	current->state = TASK_INTERRUPTIBLE;
> 	smp_mb();
> 
> Furthermore, wake_up_process(task) must read task->state in order to
> see whether task needs to be woken up.  HOWEVER, the kerneldoc for
> wake_up_process() says:
> 
>  * It may be assumed that this function implies a write memory barrier before
>  * changing the task state if and only if any tasks are woken up.
> 
> (Side point: This says _write_ memory barrier, not _full_ memory
> barrier, which is what is needed here.)
> 
> So if CPU 0 was not yet in TASK_INTERRUPTIBLE when CPU 1 ran
> wake_up_process(), then CPU 1 would not execute any barriers and we
> would have:
> 
> 	CPU 0				CPU 1
> 	D: Write task->state		B: Write thread_wakeup_needed
> 	   smp_mb()			C: Read task->state
> 	E: Read thread_wakeup_needed
> 
> with no memory barrier on CPU 1.  In memory model circles this is known
> as the SB (Store Buffering) pattern, and without a full barrier on both
> sides it is entirely possible for both reads to miss seeing the other
> CPU's write.  That's what causes CPU 0 to call schedule() with nobody
> to wake it up again: CPU 0 sees that thread_wakeup_needed is still 0, 
> and CPU 1 sees that task->state is still TASK_RUNNING.
> 
> Please note that this analysis remains unchanged if the smp_rmb() after 
> A is changed to smp_mb().  The problem is not related to the 
> assignment at A; it is related to the assignment at B.  We need to have 
> a memory barrier between B and C regardless of whether the task was 
> woken up.
> 
> In other words, I believe the correct fix is to add smp_mb() in 
> f_mass_storage's wakeup_thread() routine, just after 
> thread_wakeup_needed is set to 1.
> 
> Before today, I had believed that wake_up_process() implicitly
> performed this memory barrier in all cases.  But the kerneldoc says it
> doesn't, and unless Peter tells us that the kerneldoc is wrong, I see
> no alternative to adding the memory barrier explicitly.
> 
> Incidentally, going to a wait_queue implementation wouldn't help.  With 
> wait_queues, the pattern for CPU 1 would be:
> 
> 	thread_wakeup_needed = 1;
> 	wake_up(&wq);
> 
> But wake_up() is implemented as a call to __wake_up(), and the 
> kerneldoc for __wake_up() says:
> 
>  * It may be assumed that this function implies a write memory barrier before
>  * changing the task state if and only if any tasks are woken up.
> 
> Just as in wake_up_process() -- and again, what we need is an 
> unconditional full barrier, not a conditional write barrier.  See also 
> the kerneldoc for waitqueue_active() in include/linux/wait.h; it states 
> that an explicit smp_mb() is required on the waker.
> 
> It's hard to believe this sort of problem hasn't bitten other people in 
> the past.  Is there a better solution than to add a memory barrier 
> every place where wake_up() (or one of its siblings) is called?
> 
> Alan Stern
> 
> P.S.: This kind of solution is okay for right away and something to put 
> in -stable.  For the longer term, however, it has hardened my resolve 
> to completely rewrite the locking and barriers in f_mass_storage.
> 
> But I would like to get this matter settled first.  Is the explicit 
> barrier truly necessary?

If you are using wait_event()/wake_up() or friends, the explicit
barrier -is- necessary.  To see this, look at v4.10's wait_event():

	#define wait_event(wq, condition)				\
	do {								\
		might_sleep();						\
		if (condition)						\
			break;						\
		__wait_event(wq, condition);				\
	} while (0)

As you can see, if the condition is set just before the wait_event()
macro checks it, there is no ordering whatsoever.  And if wake_up()
finds nothing to wake up, there is no relevant ordering on that side,
either.

So you had better supply your own ordering, period, end of story.

It is reasonable to ask why we don't change wait_event()/wake_up() to
supply the needed ordering.  And in the case of wait_event(), you could
imagine placing a memory barrier between the check of "condition" and
the "break" statment.  But it would need to be a full memory barrier --
you cannot do smp_load_acquire() on a complex condition, after all! --
and in the common use case where a lock is used, this is just pure
useless overhead.

Worse yet, you just cannot put any ordering in wake_up() that will
solve the problem.  If you are using a need_wake_up flag, the code
has to look as follows:

	changes_that_must_be_visible_to_woken_thread();
	smp_store_release(&need_wake_up, true);
	wake_up(&wq);

In the special case where there is a single wakeup flag, you could add
it to the wake_up() and wait_event APIs:

	changes_that_must_be_visible_to_woken_thread();
	wake_up_release(&wq, &need_wake_up);

	-----

	wait_event_acquire(wq, &need_wake_up);

But RCU's use of wait_event()/wake_up() tends to have complex conditions
that wait_event_acquire() simply cannot handle:

-	atomic_read(&n_rcu_perf_writer_finished) >= nrealwriters

-	(newphase = smp_load_acquire(&barrier_phase)) != lastphase ||
	torture_must_stop())

-	atomic_read(&barrier_cbs_count) == 0 || torture_must_stop()

-	!READ_ONCE(sp->srcu_lock_nesting[idx])

-	rsp->gp_state == GP_PASSED

-	READ_ONCE(rsp->gp_flags) & RCU_GP_FLAG_INIT

-	rcu_gp_fqs_check_wake(rsp, &gf)

-	sync_exp_work_done(rsp, &rdp->exp_workdone2, s)

-	sync_rcu_preempt_exp_done(rnp_root)

-	sync_exp_work_done(rsp, &rdp->exp_workdone0, s)

-	atomic_read(&oom_callback_count) == 0

-	(d = ULONG_CMP_GE(READ_ONCE(rnp->completed), c))

+	!READ_ONCE(my_rdp->nocb_leader_sleep)

	--- This -could- use swait_event_interruptible_acquire()
	    if the true/false sense of ->nocb_leader_sleep was
	    inverted, which it could be.

?	READ_ONCE(rdp->nocb_follower_head)

	--- Except that there is a polling code path that requires
	    ordering in any case.

+	rcu_tasks_cbs_head

So this API would help 2-3 RCU uses of 15, so it is not clear to me that
it is worthwhile.  But if there are enough use cases in other subsystems,
why not?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux