On Tue, 2018-03-27 at 16:51 -1000, Linus Torvalds wrote: > On Tue, Mar 27, 2018 at 3:03 PM, Benjamin Herrenschmidt > <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > The discussion at hand is about > > > > dma_buffer->foo = 1; /* WB */ > > writel(KICK, DMA_KICK_REGISTER); /* UC */ > > Yes. That certainly is ordered on x86. In fact, afaik it's ordered > even if that writel() might be of type WC, because that only delays > writes, it doesn't move them earlier. Ok so this is our answer ... ... snip ... (thanks for the background info !) > Oh, the above UC case is absoutely guaranteed. Good. Then.... > The only issue really is that 99.9% of all testing gets done on x86 > unless you look at specific SoC drivers. > > On ARM, for example, there is likely little reason to care about x86 > memory ordering, because there is almost zero driver overlap between > x86 and ARM. > > *Historically*, the reason for following the x86 IO ordering was > simply that a lot of architectures used the drivers that were > developed on x86. The alpha and powerpc workstations were *designed* > with the x86 IO bus (PCI, then PCIe) and to work with the devices that > came with it. > > ARM? PCIe is almost irrelevant. For ARM servers, if they ever take > off, sure. But 99.99% of ARM is about their own SoC's, and so "x86 > test coverage" is simply not an issue. > > How much of an issue is it for Power? Maybe you decide it's not a big deal. > > Then all the above is almost irrelevant. So the overlap may not be that NIL in practice :-) But even then that doesn't matter as ARM has been happily implementing the same semantic you describe above for years, as do we powerpc. This is why, I want (with your agreement) to define clearly and once and for all, that the Linux semantics of writel are that it is ordered with previous writes to coherent memory (*) This is already what ARM and powerpc provide, from what you say, what x86 provides, I don't see any reason to keep that badly documented and have drivers randomly growing useless wmb()'s because they don't think it works on x86 without them ! Once that's sorted, let's tackle the problem of mmiowb vs. spin_unlock and the problem of writel_relaxed semantics but as separate issues :-) Also, can I assume the above ordering with writel() equally applies to readl() or not ? IE: dma_buf->foo = 1; readl(STUPID_DEVICE_DMA_KICK_ON_READ); Also works on x86 ? (It does on power, maybe not on ARM). Cheers, Ben. (*) From an Linux API perspective, all of this is only valid if the memory was allocated by dma_alloc_coherent(). Anything obtained by dma_map_something() might have been bounced bufferred or might require extra cache flushes on some architectures, and thus needs dma_sync_for_{cpu,device} calls. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html