Re: [PATCH] add delay between port write and port read

"Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx> · Wed, 27 Feb 2019 18:49:57 +0000 (GMT)

On Tue, 26 Feb 2019, Will Deacon wrote:

> > If they are the same device (just different data ports), I'd
> > *definitely* expect them to be ordered.
> > 
> > We have tons of code that depends on that. Almost every driver out
> > there, in fact.
> > 
> > So we need the mb() on alpha to guarantee the access ordering on the
> > CPU side, and then PCI itself ends up guaranteeing that accesses to
> > the same device will remain ordered outside the CPU.
> > 
> > Agreed?
> 
> Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
> good.

 FAOD, I think this assumption/requirement only applies to the plain 
accessors (`inX', `readX', `ioreadX', etc.).

 For performance reasons we may decide sometime to opt in for accessors 
that do not suffer from the requirement to be strongly ordered WRT each 
other, for the benefit to architectures that are not strongly ordered with 
MMIO and that suffer a lot from serialising accesses that do not really 
care, e.g. where you need to load a bunch of device registers or maybe 
even device RAM in any order before making a serialised final request to 
accept the values loaded.

 I made provisions for that with a driver I recently added with commit 
61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700 
TURBOchannel adapter"), where locally defined accessor macros suffixed 
with `_o' and `_u' denote accesses that have to be strongly ordered and 
can be weakly ordered respectively WRT each other.

 Right now they all expand to the respective `_relaxed' accessors (with a 
lone `dma_rmb' inserted appropriately; yes, the device does DMA one way 
only, and the other one is PIO with a lot of MMIO traffic to board RAM 
that would benefit from omitting barriers), however they can be replaced 
with references to truly unordered accessors if we ever have them.

 That piece of hardware is however rather peculiar and not an example of 
the most common design seen nowadays and I am not sure if the extra 
maintenance burden across all the ports for any additional accessors would 
be outweighed by the benefit for the weakly ordered MMIO architectures 
(where an execution stall can indeed count in hundreds of clock cycles per 
barrier inserted) combined with the appreciation (i.e. actual use) level 
from driver writers who do not necessarily grok all that weak ordering 
business.

  Maciej