On Mon, Sep 05, 2016 at 11:29:26AM -0400, Alan Stern wrote: > On Mon, 5 Sep 2016, Peter Zijlstra wrote: > > > > Actually, seeing it written out like this, one realizes that it really > > > ought to be: > > > > > > CPU 0 CPU 1 > > > ----- ----- > > > Start DMA Handle DMA-complete irq > > > Sleep until bh->state smp_mb() > > > set bh->state > > > Wake up CPU 0 > > > smp_mb() > > > Compute rc based on contents of the DMA buffer > > > > > > (Bear in mind also that on some platforms, the I/O operation is carried > > > out by PIO rather than DMA.) > > Also, smp_mb() doesn't necessarily interact with MMIO/DMA at all IIRC. > > Its only defined to do CPU/CPU interactions. > > Suppose the DMA master finishes filling up an input buffer and issues a > completion irq to CPU1. Presumably the data would then be visible to > CPU1 if the interrupt handler looked at it. So if CPU1 executes smp_mb > before setting bh->state and waking up CPU0, and if CPU0 executes > smp_mb after testing bh->state and before reading the data buffer, > wouldn't CPU0 then see the correct data in the buffer? Even if CPU0 > never did go to sleep? Couple of notes here; I would expect the DMA master to make its stores _globally_ visible on 'completion'. Because I'm not sure our smp_mb() would help make it globally visible, since its only defined on CPU/CPU interactions, not external. Note that for example ARM has the distinction where smp_mb() uses DMB-ISH barrier, dma_[rw]mb() uses DMB-OSH{LD,ST} and mb() uses DSB-SY. The ISH domain is the Inner-SHarable (IIRC) and only includes CPUs. The OSH is Outer-SHarable and adds external agents like DMA (also includes CPUs). The DSB-SY thing is even heavier and syncs world or something; I always forget these details. > Or would something more be needed? The thing is, this is x86 (TSO). Most everything is globally ordered/visible and full barriers. The only reorder allowed by TSO is a later read can happen before a prior store. That is, only: X = 1; smp_mb(); r = Y; actually needs a full barrier. All the other variants are no-ops. Note that x86's dma_[rw]mb() are no-ops (like the regular smp_[rw]mb()). Which seems to imply DMA is coherent and globally visible. > > I would very much expect the device IO stuff to order things for us in > > this case. "Start DMA" should very much include sufficient fences to > > ensure the data under DMA is visible to the DMA engine, this would very > > much include things like flushing store buffers and maybe even writeback > > caches, depending on platform needs. > > > > At the same time, I would expect "Handle DMA-complete irq", even if it > > were done with a PIO polling loop, to guarantee ordering against later > > operations such that 'complete' really means that. > > That's what I would expect too. > > Back in the original email thread where the problem was first reported, > Felipe says that the problem appears to be something else. Here's what > it looks like now, in schematic form: > > CPU0 > ---- > get_next_command(): > while (bh->state != BUF_STATE_EMPTY) > sleep_thread(); > start an input request for bh > while (bh->state != BUF_STATE_FULL) > sleep_thread(); > > As mentioned above, the input involves DMA and is terminated by an irq. > The request's completion handler is bulk_out_complete(): > > CPU1 > ---- > bulk_out_complete(): > bh->state = BUF_STATE_FULL; > wakeup_thread(); > > According to Felipe, when CPU0 wakes up and checks bh->state, it sees a > value different from BUF_STATE_FULL. So it goes back to sleep again > and doesn't make any forward progress. > > It's possible that something else is changing bh->state when it > shouldn't. But if this were the explanation, why would Felipe see that > the problem goes away when he changes the memory barriers in > sleep_thread() and wakeup_thread() to smp_mb()? Good question, let me stare at the code more. Could you confirm that bulk_{in,out}_complete() work on different usb_request structures, and they can not, at any time, get called on the _same_ request? -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html