Re: g_mass_storage not queueing requests

Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> · Mon, 5 Sep 2016 11:02:07 -0400 (EDT)

On Mon, 5 Sep 2016, Felipe Balbi wrote:

> okay, I'm thinking that's a plausible explanation, please correct me if
> I'm wrong. What I'm speculating now is that inside a locked region,
> reordering can happen (can it?)

Yes, it can.

> which means that our:
> 
> 	spin_lock(&common->lock);
> 	bh->outreq_busy = 0;
> 	bh->state = BUF_STATE_FULL;
> 	wakeup_thread(common);
> 	spin_unlock(&common->lock);
> 
> ends up being reordered as:
> 
> 	spin_lock(&common->lock);
> 	bh->outreq_busy = 0;
> 	wakeup_thread(common);
> 	bh->state = BUF_STATE_FULL;
> 	spin_unlock(&common->lock);
> 
> which triggers the fault. Is that possible?

That's why wakeup_thread needs to include a memory barrier -- to
prevent exactly this sort of thing from happening.

> Looking at the history.git tree [0] I found a little more of
> file_storage gadget's history. Here are some of the details I could find
> out:
> 
> When you first wrote the file_storage gadget, wakeup_thread() was called
> outside the lock from bulk_{in,out}_complete() (see [1]). A follow-up
> patch ended up moving wakeup_thread() inside the locked region (while
> also removing a wait_queue()) which, I'm speculating, introduced the
> race condition (see [2])
> 
> Another piece of speculation (which I can test, if necessary) is that
> larger mass storage buffers (e.g. 64k or 128k) would hide this race
> because it would take longer for UDC to complete the request. This might
> be why we never saw this before with HS-only UDCs.
> 
> Are we really dealing with a regression introduced back in 2.6.15?

Quite possibly.  Or maybe it was there from the very beginning, but it 
doesn't show up unless you're using a fast processor.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html