Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130

okaya@xxxxxxxxxxxxxx · Tue, 21 Aug 2018 07:54:33 -0400

On 2018-08-21 05:12, Maciej W. Rozycki wrote:
On Mon, 20 Aug 2018, Sinan Kaya wrote:

>   Likewise see memory-barriers.txt throughout concerning `mmiowb' (which is
> an obviously lighter weight barrier compared to `readX').

Here is a better reference from memory-barriers.txt

 (*) readX(), writeX():

     Whether these are guaranteed to be fully ordered and uncombined 
with
     respect to each other on the issuing CPU depends on the
characteristics
     defined for the memory window through which they're accessing.  
On later
     i386 architecture machines, for example, this is controlled by 
way of the
     MTRR registers.

     Ordinarily, these will be guaranteed to be fully ordered and 
uncombined,
     provided they're not accessing a prefetchable device.

 See the next sentence too, and I am concerned about the 
"characteristics
defined for the memory window" qualification here -- how is the memory
window defined in the general sense?  For i386 we have the MTRR 
registers,
but how about other platforms?

 Anyway, if we were to guarantee that `readX' and `writeX' were fully
ordered, then we would have to place barriers in matching places across
accessors, i.e. either before or after the actual MMIO access, but
uniformly across all of them, rather than having them mixed.  Placing 
them
beforehand is normally better as buffers will often have drained 
already
by that time, meaning the performance cost of the barrier will be 
lower.

 As from commit commit 92d7223a7423 ("alpha: io: reorder barriers to
guarantee writeX() and iowriteX() ordering #2") we have barriers in 
mixed
positions and placed beforehand and afterwards in write and read 
accesses
respectively, meaning that if we issue say:

	writel(x, foo);
	y = readl(bar);

then the read from `bar' can be reordered ahead of the write to `foo',
which is very, very bad, breaking requirements set out across
io_ordering.txt and memory-barriers.txt.  I am fairly sure this is the
cause of the regression observed.

The location of barrier is for observability with respect to memory 
rather than other register accesses.

Now, i am used to arm where the architecture guarantees that register 
accesses are ordered via devicd nGnRE.

If this architecture can reorder register accesses, then we need a 
barrier before and after register access to make readX/writeX strongly 
ordered.

Please confirm the arch behavior

 You need to make a corresponding update to `readX' and `ioreadX' then
(and once that has been fixed we can consider the general matter of 
MMIO
barriers independently).

  Maciej