Re: g_mass_storage not queueing requests

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 5 Sep 2016 11:35:57 -0400 (EDT)

On Mon, 5 Sep 2016, Felipe Balbi wrote:

> Hi,
> 
> Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes:
> > On Mon, 5 Sep 2016, Felipe Balbi wrote:
> >
> >> okay, I'm thinking that's a plausible explanation, please correct me if
> >> I'm wrong. What I'm speculating now is that inside a locked region,
> >> reordering can happen (can it?)
> >
> > Yes, it can.
> 
> alright, thanks
> 
> >> which means that our:
> >> 
> >> 	spin_lock(&common->lock);
> >> 	bh->outreq_busy = 0;
> >> 	bh->state = BUF_STATE_FULL;
> >> 	wakeup_thread(common);
> >> 	spin_unlock(&common->lock);
> >> 
> >> ends up being reordered as:
> >> 
> >> 	spin_lock(&common->lock);
> >> 	bh->outreq_busy = 0;
> >> 	wakeup_thread(common);
> >> 	bh->state = BUF_STATE_FULL;
> >> 	spin_unlock(&common->lock);
> >> 
> >> which triggers the fault. Is that possible?
> >
> > That's why wakeup_thread needs to include a memory barrier -- to
> > prevent exactly this sort of thing from happening.
> 
> but is smp_wmb() enough or do we need a full barrier?

That's more or less the question that started the other email thread.  
According to Peter, the barriers already present in wake_up_process() 
and try_to_wake_up() should be sufficient to prevent this reordering.

Besides, x86 is more strongly ordered than architectures like PPC or 
ARM.  It rarely needs extra barriers (mostly to prevent a later read 
from moving up before an earlier write).

> >> Looking at the history.git tree [0] I found a little more of
> >> file_storage gadget's history. Here are some of the details I could find
> >> out:
> >> 
> >> When you first wrote the file_storage gadget, wakeup_thread() was called
> >> outside the lock from bulk_{in,out}_complete() (see [1]). A follow-up
> >> patch ended up moving wakeup_thread() inside the locked region (while
> >> also removing a wait_queue()) which, I'm speculating, introduced the
> >> race condition (see [2])
> >> 
> >> Another piece of speculation (which I can test, if necessary) is that
> >> larger mass storage buffers (e.g. 64k or 128k) would hide this race
> >> because it would take longer for UDC to complete the request. This might
> >> be why we never saw this before with HS-only UDCs.
> >> 
> >> Are we really dealing with a regression introduced back in 2.6.15?
> >
> > Quite possibly.  Or maybe it was there from the very beginning, but it 
> > doesn't show up unless you're using a fast processor.
> 
> not only a fast processor, but also a really fast USB. I couldn't
> reproduce this before optimizing dwc3 (we're getting really good mass
> storage throughput)

Indeed.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html