On Wed, 22 Aug 2018, Sinan Kaya wrote: > > It's hard to tell. The Alpha manual says that only overlapping accesses > > are ordered. > > > > I did some tests on framebuffer and found out that "read+read+write+write" > > is faster than "read+write+read+write" - that may suggest that the reads > > flush the write queue. > > Do you know if the framebuffer BAR you are using is non-prefetchable? (you > can find out from lspci) > > Ordering rule only applies to non-prefetchable BARs only. Architectures are > allowed to do whatever they want for for prefetchable BARs. Well, data accesses have to reach the relevant PCI host bridge first (i.e. leave the CPU and pass through any intermediate bus bridges between the CPU and the PCI bus tree accessed) for any PCI data ordering rules to apply. Depending on the system architecture this may or may not require OS software intervention. NB this is a general observation, not specific to Alpha. "Alpha Architecture Handbook" has an extensive discussion on data ordering, concerning both memory and MMIO (termed "memory-like region" and "non-memory-like region" respectively in the said document), and I'll try to get through all of it in the coming days to see if I can get to a conclusion which will let us avoid excessive synchronisation. Meanwhile I'll be happy of course to accept any input backed with suitable references. Maciej