Re: Alpha Avanti broken by 9ce8654323d69273b4977f76f11c9e2d345ab130

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Wed, 22 Aug 2018 11:50:23 -0400 (EDT)

On Wed, 22 Aug 2018, Maciej W. Rozycki wrote:

> On Wed, 22 Aug 2018, Sinan Kaya wrote:
> 
> > > It's hard to tell. The Alpha manual says that only overlapping accesses
> > > are ordered.
> > > 
> > > I did some tests on framebuffer and found out that "read+read+write+write"
> > > is faster than "read+write+read+write" - that may suggest that the reads
> > > flush the write queue.
> > 
> > Do you know if the framebuffer BAR you are using is non-prefetchable? (you
> > can find out from lspci)
> > 
> > Ordering rule only applies to non-prefetchable BARs only. Architectures are
> > allowed to do whatever they want for for prefetchable BARs.
> 
>  Well, data accesses have to reach the relevant PCI host bridge first 
> (i.e. leave the CPU and pass through any intermediate bus bridges between 
> the CPU and the PCI bus tree accessed) for any PCI data ordering rules to 
> apply.  Depending on the system architecture this may or may not require 
> OS software intervention.  NB this is a general observation, not specific 
> to Alpha.
> 
>  "Alpha Architecture Handbook" has an extensive discussion on data 
> ordering, concerning both memory and MMIO (termed "memory-like region" and 
> "non-memory-like region" respectively in the said document), and I'll try 
> to get through all of it in the coming days to see if I can get to a 
> conclusion which will let us avoid excessive synchronisation.
> 
>  Meanwhile I'll be happy of course to accept any input backed with 
> suitable references.
> 
>   Maciej

According to the Alpha handbook, non-overlapping accesses may be 
reordered.

So if someone does 
writel(REG1);
readl(REG2);

readl may (according to the spec) reach the device before writel. Although 
actual experiments suggests that the read flushes the queued writes.

I would be quite interested why did Linux developers decide that readl 
should be implemented as "read+barrier" and writel should be implemented 
as "barrier+write". Why is there this assymetry in the barriers?

Does ARM have some hardware magic that prevents reordering the write and 
the read in this case?

Will Deacon made the change to "memory-barriers.txt" to specify this 
requirement - could you please describe why did you specify it this way?

Mikulas