On Mon, 5 Sep 2016, Felipe Balbi wrote: > Hi, > > Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> writes: > > On Mon, 5 Sep 2016, Felipe Balbi wrote: > > > >> okay, I'm thinking that's a plausible explanation, please correct me if > >> I'm wrong. What I'm speculating now is that inside a locked region, > >> reordering can happen (can it?) > > > > Yes, it can. > > alright, thanks > > >> which means that our: > >> > >> spin_lock(&common->lock); > >> bh->outreq_busy = 0; > >> bh->state = BUF_STATE_FULL; > >> wakeup_thread(common); > >> spin_unlock(&common->lock); > >> > >> ends up being reordered as: > >> > >> spin_lock(&common->lock); > >> bh->outreq_busy = 0; > >> wakeup_thread(common); > >> bh->state = BUF_STATE_FULL; > >> spin_unlock(&common->lock); > >> > >> which triggers the fault. Is that possible? > > > > That's why wakeup_thread needs to include a memory barrier -- to > > prevent exactly this sort of thing from happening. > > but is smp_wmb() enough or do we need a full barrier? That's more or less the question that started the other email thread. According to Peter, the barriers already present in wake_up_process() and try_to_wake_up() should be sufficient to prevent this reordering. Besides, x86 is more strongly ordered than architectures like PPC or ARM. It rarely needs extra barriers (mostly to prevent a later read from moving up before an earlier write). > >> Looking at the history.git tree [0] I found a little more of > >> file_storage gadget's history. Here are some of the details I could find > >> out: > >> > >> When you first wrote the file_storage gadget, wakeup_thread() was called > >> outside the lock from bulk_{in,out}_complete() (see [1]). A follow-up > >> patch ended up moving wakeup_thread() inside the locked region (while > >> also removing a wait_queue()) which, I'm speculating, introduced the > >> race condition (see [2]) > >> > >> Another piece of speculation (which I can test, if necessary) is that > >> larger mass storage buffers (e.g. 64k or 128k) would hide this race > >> because it would take longer for UDC to complete the request. This might > >> be why we never saw this before with HS-only UDCs. > >> > >> Are we really dealing with a regression introduced back in 2.6.15? > > > > Quite possibly. Or maybe it was there from the very beginning, but it > > doesn't show up unless you're using a fast processor. > > not only a fast processor, but also a really fast USB. I couldn't > reproduce this before optimizing dwc3 (we're getting really good mass > storage throughput) Indeed. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html