On Fri, Jun 19, 2015 at 02:13:02PM +0100, Vineet Gupta wrote: > On Thursday 11 June 2015 07:09 PM, Will Deacon wrote: > > On Thu, Jun 11, 2015 at 01:13:28PM +0100, Vineet Gupta wrote: > >> On Wednesday 10 June 2015 06:31 PM, Will Deacon wrote: > >>> You also need that guarantee in your readl/writel family of macros. It's > >>> extremely heavy and rarely needed, which is why I added the _relaxed > >>> versions to all architectures. > >> > >> Wow - adding that to these accessors will really be heavy - given that a whole > >> bunch of drivers still use the stock API (or perhaps don't know / care whether > >> they need the readl or the relaxed api. And it is practically impossible to switch > >> them over - after if ain't broken how can u fix it. So far we've been testing this > >> implementation (readl/writel - w/o any explicit barrier) on slower FPGA builds and > >> this includes a whole bunch of designware IP - mmc, eth, gpio.... and don't see > >> any ill effects - do you reckon we still need to add it. > > > > Unfortunately, yes, as that's effectively what the kernel requires: > > > > http://marc.info/?l=linux-kernel&m=121192394430581&w=2 > > http://thread.gmane.org/gmane.linux.ide/46414 > > Oh great - thx for those ! > > > The conclusion is that x86 *does* provide this ordering in its accessors > > and drivers are written to assume that, so either you go round fixing all > > the drivers by adding the missing barriers or you implement it in your > > accessors (like we have done on ARM). Subtle I/O ordering issues are no > > fun to debug. > > > > That's also the reason I added the _relaxed versions, so you can port > > drivers one-by-one to the weaker semantics whilst having the potentially > > broken drivers continue to work. > > > > OK, so given that regular/mmio is also weakly ordered, it would seem that we need > full mb() *before* and *after* the IO access in the non relaxed API. ARM code > seems to put a rmb() after the readl and wmb() before the writel. Is that based on > how h/w provides for some ? We figured that you'd likely be doing something like: <writel_relaxed DMA buffer> <writel MMIO "go" reg> or: <readl MMIO "status" reg> <readl_relaxed DMA buffer> so ended up with writel doing {wmb(); writel_relaxed} and readl doing {readl_relaxed; rmb()}. > In one of the links you posted above, Catalin posed the same question, but I > didn't see response to that. > > | If we are to make the writel/readl on ARM fully ordered with both IO > | (enforced by hardware) and uncached memory, do we add barriers on each > | side of the writel/readl etc.? The common cases would require a barrier > | before writel (write buffer flushing) and a barrier after readl (in case > | of polling for a "DMA complete" state). > | > | So if io_wmb() just orders to IO writes (writel_relaxed), does it mean > | that we still need a mighty wmb() that orders any type of accesses (i.e. > | uncached memory vs IO)? Can drivers not use the strict writel() and no > | longer rely on wmb() (wondering whether we could simplify it on ARM with > | fully ordered IO accessors)? > > Further readl/writel would be no different than ioread32/iowrite32 ? ioread32/iowrite32 can be used with port addresses and dispatch to the relevant accessors depending on that. The memory ordering semantics should be the same as readl/writel. > FWIW, h/w folks tell me that DMB guarentess local barrier semantics so we don't > need to use DSYNC. Latter only provides full r+w+TLB/BPU stuff while DMB allows > finer grained r/w/r+w. But if we need full mb then using one vs. other becomes a > moot point. I'd say go with what we do on ARM/arm64, then at least we have consistency in the use of barriers. Will -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in