On Wed, Aug 22, 2018 at 5:50 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > On Wed, 22 Aug 2018, Maciej W. Rozycki wrote: > > On Wed, 22 Aug 2018, Sinan Kaya wrote: > > According to the Alpha handbook, non-overlapping accesses may be > reordered. > > So if someone does > writel(REG1); > readl(REG2); > > readl may (according to the spec) reach the device before writel. Although > actual experiments suggests that the read flushes the queued writes. > > I would be quite interested why did Linux developers decide that readl > should be implemented as "read+barrier" and writel should be implemented > as "barrier+write". Why is there this assymetry in the barriers? I can explain this part: those two barriers are used specifically do order an MMIO access against a DMA access: a writel() may be used to start a DMA operation copying data from RAM to the device, so we must have a barrier between the store to that data and the store to the register to ensure the data is visible to the device. Similarly, a readl() may check the status of a register that tells us when a DMA from device to RAM has completed. We must have a read barrier between that mmio load and the load from RAM to prevent the data to be prefetched while the MMIO is still in progress. > Does ARM have some hardware magic that prevents reordering the write and > the read in this case? Most architecture have this AFAICT, ARM and x86 definitely do, and PCI requires this to be true on the bus: All MMIO accesses from a given CPU to a given device (according to an architecture-specific definition of "device") are ordered with respect to one another. If the hardware does not guarantee that, for simple load/store operations on uncached device memory, then we need a full barrier after each store in addition to the write barrier needed for the DMA synchronization. Arnd