On Wed, 22 Aug 2018, Arnd Bergmann wrote: > On Wed, Aug 22, 2018 at 5:50 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > On Wed, 22 Aug 2018, Maciej W. Rozycki wrote: > > > On Wed, 22 Aug 2018, Sinan Kaya wrote: > > > > According to the Alpha handbook, non-overlapping accesses may be > > reordered. > > > > So if someone does > > writel(REG1); > > readl(REG2); > > > > readl may (according to the spec) reach the device before writel. Although > > actual experiments suggests that the read flushes the queued writes. > > > > I would be quite interested why did Linux developers decide that readl > > should be implemented as "read+barrier" and writel should be implemented > > as "barrier+write". Why is there this assymetry in the barriers? > > I can explain this part: those two barriers are used specifically do order > an MMIO access against a DMA access: a writel() may be used to start > a DMA operation copying data from RAM to the device, so we must > have a barrier between the store to that data and the store to the register > to ensure the data is visible to the device. > Similarly, a readl() may check the status of a register that tells us when > a DMA from device to RAM has completed. We must have a read > barrier between that mmio load and the load from RAM to prevent > the data to be prefetched while the MMIO is still in progress. Then - the question is - why not just use barriers before and after accesses to DMA'd memory? For DMA into non-coheren memory, the barrier could be injected into dma_map_* and dma_unmap_* functions (with no change in drivers) - and for DMA into coherent memory you could have something like dma_coherent_barrier(). Why does Linux add the barriers between every read and write to memory mapped registers? > > Does ARM have some hardware magic that prevents reordering the write and > > the read in this case? > > Most architecture have this AFAICT, ARM and x86 definitely do, and > PCI requires this to be true on the bus: > > All MMIO accesses from a given CPU to a given device (according > to an architecture-specific definition of "device") are ordered with respect > to one another. If ARM guarantees that the accesses to a given device are not reordered - then the barriers in readl and writel are superfluous. > If the hardware does not guarantee that, for simple load/store operations > on uncached device memory, then we need a full barrier after each store > in addition to the write barrier needed for the DMA synchronization. > > Arnd Mikulas