On Mon, 5 Sep 2016, Felipe Balbi wrote: > okay, I'm thinking that's a plausible explanation, please correct me if > I'm wrong. What I'm speculating now is that inside a locked region, > reordering can happen (can it?) Yes, it can. > which means that our: > > spin_lock(&common->lock); > bh->outreq_busy = 0; > bh->state = BUF_STATE_FULL; > wakeup_thread(common); > spin_unlock(&common->lock); > > ends up being reordered as: > > spin_lock(&common->lock); > bh->outreq_busy = 0; > wakeup_thread(common); > bh->state = BUF_STATE_FULL; > spin_unlock(&common->lock); > > which triggers the fault. Is that possible? That's why wakeup_thread needs to include a memory barrier -- to prevent exactly this sort of thing from happening. > Looking at the history.git tree [0] I found a little more of > file_storage gadget's history. Here are some of the details I could find > out: > > When you first wrote the file_storage gadget, wakeup_thread() was called > outside the lock from bulk_{in,out}_complete() (see [1]). A follow-up > patch ended up moving wakeup_thread() inside the locked region (while > also removing a wait_queue()) which, I'm speculating, introduced the > race condition (see [2]) > > Another piece of speculation (which I can test, if necessary) is that > larger mass storage buffers (e.g. 64k or 128k) would hide this race > because it would take longer for UDC to complete the request. This might > be why we never saw this before with HS-only UDCs. > > Are we really dealing with a regression introduced back in 2.6.15? Quite possibly. Or maybe it was there from the very beginning, but it doesn't show up unless you're using a fast processor. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html