On Wed, 22 Aug 2018, Arnd Bergmann wrote: > > According to the Alpha handbook, non-overlapping accesses may be > > reordered. I have had a notion of this since forever, however I have had troubles tracking down the exact reference in the architecture specification. > > So if someone does > > writel(REG1); > > readl(REG2); > > > > readl may (according to the spec) reach the device before writel. Although > > actual experiments suggests that the read flushes the queued writes. Individual implementations can surely be more strongly ordered than the architecture specification requires. > > Does ARM have some hardware magic that prevents reordering the write and > > the read in this case? > > Most architecture have this AFAICT, ARM and x86 definitely do, and > PCI requires this to be true on the bus: > > All MMIO accesses from a given CPU to a given device (according > to an architecture-specific definition of "device") are ordered with respect > to one another. > > If the hardware does not guarantee that, for simple load/store operations > on uncached device memory, then we need a full barrier after each store > in addition to the write barrier needed for the DMA synchronization. MIPS is architecturally even more weakly ordered and a set of barrier instructions has been defined for synchronisation: SYNC for a completion barrier, and SYNC_ACQUIRE, SYNC_RELEASE, SYNC_RMB, SYNC_WMB and SYNC_MB for various ordering barriers. Older architecture revisions had this less standardised. Many if not most implementations are more strongly ordered though, in which case the relevant SYNC instructions are effectively NOPs. I'd expect some other architectures to be similarly weakly ordered. FWIW, Maciej