On Wed, 13 May 2020, Ivan Kokshaysky wrote: > > Individual PCI port locations correspond to different MMIO locations, so > > yes, accesses to these can be reordered (merging won't happen due to the > > use of the sparse address space). > > Correct, it's how Alpha write buffers work. According to 21064 hardware > reference manual, these buffers are flushed when one of the following > conditions is met: > > 1) The write buffer contains at least two valid entries. > 2) The write buffer contains one valid entry and at least 256 CPU cycles > have elapsed since the execution of the last write buffer-directed > instruction. > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > 4) A load miss is pending to an address currently valid in the write > buffer that requires the write buffer to be flushed. > > I'm certain that in these rtc/serial cases we've got readX arriving > to device *before* preceeding writeX because of 2). That's why small > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > "fixes" the problem. The 4) is not met because loads and stores are > to different ports, and 3) has been broken by commit 92d7223a74. > > So I believe that correct fix would be to revert 92d7223a74 and > add wmb() before [io]writeX macros to meet memory-barriers.txt > requirement. The "wmb" instruction is cheap enough and won't hurt > IO performance too much. Hmm, having barriers *afterwards* across all the MMIO accessors, even ones that do not have such a requirement according to memory-barriers.txt, does hurt performance unnecessarily however. What I think has to be done is adding barriers beforehand, and then only add barriers afterwards where necessary. Commit 92d7223a74 did a part of that, but did not consistently update all the remaining accessors. So I don't think reverting 92d7223a74 permanently is the right way to go, however it certainly makes sense to do that temporarily to get rid of the fatal regression, sort all the corner cases and then apply the resulting replacement fix. I think ultimately we do want the barriers beforehand, just like the MIPS port has (and survives) in arch/mips/include/asm/io.h. Observe that unlike the Alpha ISA the MIPS ISA does have nontrivial `rmb' aka the SYNC_RMB hardware instruction. Maciej