On Wednesday 10 June 2015 06:31 PM, Will Deacon wrote: > On Wed, Jun 10, 2015 at 11:58:40AM +0100, Peter Zijlstra wrote: >> On Wed, Jun 10, 2015 at 09:34:18AM +0000, Vineet Gupta wrote: >>> On Tuesday 09 June 2015 06:10 PM, Peter Zijlstra wrote: >> I think the most interesting part is the device side. >> >>>>> +/* >>>>> + * DSYNC: >>>>> + * - Waits for completion of all outstanding memory operations before any new >>>>> + * operations can begin >>>>> + * - Includes implicit memory operations such as cache/TLB/BPU maintenance ops >>>>> + * - Lighter version of SYNC as it doesn't wait for non-memory operations >>>>> + */ >>>>> +#define mb() asm volatile("dsync\n" : : : "memory") >>>> So mb() is supposed to order against things like DMA memory ops, is DMA >>>> part of point 1 or 3, if 3, this is not a suitable instruction. >>> Can u please explain the DMA case a bit more ? From what I understood and used in >>> say ethernet driver, it is more of a line drawn between say cpu updating a shared >>> buffer descriptor and kicking a MMIO register (which in turn could initiate a DMA) >>> but I'm not sure how mb() can possibly order with DMA per se (unless there's some >>> advanced form of IO-coherency) >> I'm afraid I might not be the best of sources here, I tend to stay away >> from actual device stuff like that. I've Cc'ed Will Deacon who might be >> able to shed a bit more light on this aspect. > I'd definitely expect mb() to order arbitrary memory accesses against each > other (i.e. regardless of whether or not they're to RAM or MMIO devices). > Some drivers use it to "flush the writebuffer" but I don't think that makes > a whole lot of sense. Certainly, on ARM, if we want to know that something > reached an MMIO endpoint then we'll need a read-back as well as the barrier > for the general case. > > You also need that guarantee in your readl/writel family of macros. It's > extremely heavy and rarely needed, which is why I added the _relaxed > versions to all architectures. Wow - adding that to these accessors will really be heavy - given that a whole bunch of drivers still use the stock API (or perhaps don't know / care whether they need the readl or the relaxed api. And it is practically impossible to switch them over - after if ain't broken how can u fix it. So far we've been testing this implementation (readl/writel - w/o any explicit barrier) on slower FPGA builds and this includes a whole bunch of designware IP - mmc, eth, gpio.... and don't see any ill effects - do you reckon we still need to add it. > The "ordering against DMA" is something like reading an MMIO register to > determine whether the DMA has completed, then going off to read the contents > out of the DMA buffer. The comment you have about DSYNC makes it sound like > it's not sufficient for this case. IMHO this use case is slightly pedantic - since DMA completion will typically follow up with an interrupt (I understand it's still possible to poll a dma status reg). at any rate when it comes to dwaring a line between memory accesses - regular or mmio, DSYNC is all we got in the ISA so ARCV2 mb() has to use it - there's no better option. -Vineet > Will > -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html