On Fri, 22 May 2020, Mikulas Patocka wrote: > On Wed, 13 May 2020, Ivan Kokshaysky wrote: > > > On Mon, May 11, 2020 at 03:58:24PM +0100, Maciej W. Rozycki wrote: > > > Individual PCI port locations correspond to different MMIO locations, so > > > yes, accesses to these can be reordered (merging won't happen due to the > > > use of the sparse address space). > > > > Correct, it's how Alpha write buffers work. According to 21064 hardware > > reference manual, these buffers are flushed when one of the following > > conditions is met: > > > > 1) The write buffer contains at least two valid entries. > > 2) The write buffer contains one valid entry and at least 256 CPU cycles > > have elapsed since the execution of the last write buffer-directed > > instruction. > > 3) The write buffer contains an MB, STQ_C or STL_C instruction. > > 4) A load miss is pending to an address currently valid in the write > > buffer that requires the write buffer to be flushed. > > > > I'm certain that in these rtc/serial cases we've got readX arriving > > to device *before* preceeding writeX because of 2). That's why small > > delay (300-1400 ns, apparently depends on CPU frequency) seemingly > > "fixes" the problem. The 4) is not met because loads and stores are > > to different ports, and 3) has been broken by commit 92d7223a74. > > > > So I believe that correct fix would be to revert 92d7223a74 and > > add wmb() before [io]writeX macros to meet memory-barriers.txt > > requirement. The "wmb" instruction is cheap enough and won't hurt > > IO performance too much. > > > > Ivan. > > I agree ... and what about readX_relaxed and writeX_relaxed? According to > the memory-barriers specification, the _relaxed functions must be ordered > w.r.t. each other. If Alpha can't keep them ordered, they should have > barriers between them too. > > Mikulas I have found the chapter about I/O in the Alpha reference manual, in the section "5.6.4.7 Implications for Memory Mapped I/O", it says: To reliably communicate shared data from a processor to an I/O device: 1. Write the shared data to a memory-like physical memory region on the processor. 2. Execute an MB instruction. 3. Write a flag (equivalently, send an interrupt or write a register location implemented in the I/O device). The receiving I/O device must: 1. Read the flag (equivalently, detect the interrupt or detect the write to the register location implemented in the I/O device). 2. Execute the equivalent of an MB. 3. Read the shared data. So, we must use MB, not WMB when ordering I/O accesses. The WMB instruction specification says that it orders writes to memory-like locations with other writes to memory-like locations. And writes to non-memory-like locations with other writes to non-memory-like locations. But it doesn't order memory-like writes with I/O-like writes. The section 5.6.3 claims that there are no implied barriers. So, if we want to support the specifications: read*_relaxed and write*_relaxed need at least one barrier before or after (memory-barriers.txt says that they must be ordered with each other, the alpha specification doesn't specify ordering). read and write - read must have a barrier after it, write before it. There must be one more barrier before a read or after a write, to make sure that there is barrier between write+read. Mikulas