On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote: > Individual PCI port locations correspond to different MMIO locations, so > yes, accesses to these can be reordered (merging won't happen due to the > use of the sparse address space). Correct, it's how Alpha write buffers work. According to 21064 hardware reference manual, these buffers are flushed when one of the following conditions is met: 1) The write buffer contains at least two valid entries. 2) The write buffer contains one valid entry and at least 256 CPU cycles have elapsed since the execution of the last write buffer-directed instruction. 3) The write buffer contains an MB, STQ_C or STL_C instruction. 4) A load miss is pending to an address currently valid in the write buffer that requires the write buffer to be flushed. I'm certain that in these rtc/serial cases we've got readX arriving to device *before* preceeding writeX because of 2). That's why small delay (300-1400 ns, apparently depends on CPU frequency) seemingly "fixes" the problem. The 4) is not met because loads and stores are to different ports, and 3) has been broken by commit 92d7223a74. So I believe that correct fix would be to revert 92d7223a74 and add wmb() before [io]writeX macros to meet memory-barriers.txt requirement. The "wmb" instruction is cheap enough and won't hurt IO performance too much. Ivan.