On Mon, 20 Aug 2018, Sinan Kaya wrote: > > Likewise see memory-barriers.txt throughout concerning `mmiowb' (which is > > an obviously lighter weight barrier compared to `readX'). > > Here is a better reference from memory-barriers.txt > > (*) readX(), writeX(): > > Whether these are guaranteed to be fully ordered and uncombined with > respect to each other on the issuing CPU depends on the > characteristics > defined for the memory window through which they're accessing. On later > i386 architecture machines, for example, this is controlled by way of the > MTRR registers. > > Ordinarily, these will be guaranteed to be fully ordered and uncombined, > provided they're not accessing a prefetchable device. See the next sentence too, and I am concerned about the "characteristics defined for the memory window" qualification here -- how is the memory window defined in the general sense? For i386 we have the MTRR registers, but how about other platforms? Anyway, if we were to guarantee that `readX' and `writeX' were fully ordered, then we would have to place barriers in matching places across accessors, i.e. either before or after the actual MMIO access, but uniformly across all of them, rather than having them mixed. Placing them beforehand is normally better as buffers will often have drained already by that time, meaning the performance cost of the barrier will be lower. As from commit commit 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") we have barriers in mixed positions and placed beforehand and afterwards in write and read accesses respectively, meaning that if we issue say: writel(x, foo); y = readl(bar); then the read from `bar' can be reordered ahead of the write to `foo', which is very, very bad, breaking requirements set out across io_ordering.txt and memory-barriers.txt. I am fairly sure this is the cause of the regression observed. You need to make a corresponding update to `readX' and `ioreadX' then (and once that has been fixed we can consider the general matter of MMIO barriers independently). Maciej