On Fri, 22 May 2020, Mikulas Patocka wrote: > > Hmm, having barriers *afterwards* across all the MMIO accessors, even > > ones that do not have such a requirement according to memory-barriers.txt, > > does hurt performance unnecessarily however. What I think has to be done > > is adding barriers beforehand, and then only add barriers afterwards where > > necessary. Commit 92d7223a74 did a part of that, but did not consistently > > update all the remaining accessors. > > > > So I don't think reverting 92d7223a74 permanently is the right way to go, > > however it certainly makes sense to do that temporarily to get rid of the > > fatal regression, sort all the corner cases and then apply the resulting > > replacement fix. > > See Documentation/memory-barriers.txt, the section "KERNEL I/O BARRIER > EFFECTS" > > According to the specification, there must be a barrier before a write to > the MMIO space (paragraph 3) and after a read from MMIO space (parahraph > 4) - if this causes unnecessary slowdown, the driver should use > readX_relaxed or writeX_relaxed functions - the *_relaxed functions are > ordered with each other (see the paragraph "(*) readX_relaxed(), > writeX_relaxed()"), but they are not ordered with respect to memory > access. The specification doesn't require a barrier *after* a write however, which is what I have been concerned about as it may cost hundreds of cycles wasted. I'm not concerned about a barrier after a read (and one beforehand is of course also required). > The commit 92d7223a74 fixes that requirement (although there is no real > driver that was fixed by this), so I don't think it should be reverted. > The proper fix should be to add delays to the serial port and readltime > clock (or perhaps to all IO-port accesses). Adding artificial delays will only paper over the real problem I'm afraid. > > I think ultimately we do want the barriers beforehand, just like the > > MIPS port has (and survives) in arch/mips/include/asm/io.h. Observe > > If the MIPS port doesn't have MMIO barrier after read[bwl], then it is > violating the specification. Perhaps there is no existing driver that is > hurt by this violation, so this violation survived. It does have a barrier, see: /* prevent prefetching of coherent DMA data prematurely */ \ if (!relax) \ rmb(); \ In the light of #5 however: 5. A readX() by a CPU thread from the peripheral will complete before any subsequent delay() loop can begin execution on the same thread. I think it may have to be replaced with a completion barrier however, and that will be tricky because you cannot just issue a second read to the resource accessed after the `rmb' to make the first read complete, as a MMIO read may have side effects (e.g. clearing an interrupt request). So the read would have to happen to a different location. Some architectures have a hardware completion barrier instruction, such as the modern MIPS ISA, which makes life sweet and easy (as much as life can be sweet and easy with a weakly ordered bus model) as no dummy read is then required, but surely not all do (including older MIPS ISA revisions). Maciej