On Wed, 22 Aug 2018, Sinan Kaya wrote: > On 8/22/2018 7:59 AM, Mikulas Patocka wrote: > > > > > > On Tue, 21 Aug 2018, Arnd Bergmann wrote: > > > > > On Tue, Aug 21, 2018 at 3:40 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > wrote: > > > > On Tue, 21 Aug 2018, Arnd Bergmann wrote: > > > > > On Mon, Aug 20, 2018 at 11:42 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > > > wrote: > > > > > > On Mon, 20 Aug 2018, Arnd Bergmann wrote: > > > > > > > On Mon, Aug 20, 2018 at 4:17 PM Mikulas Patocka > > > > > > > <mpatocka@xxxxxxxxxx> wrote: > > > > > > > > On Sun, 19 Aug 2018, okaya@xxxxxxxxxxxxxx wrote: > > > > > > > > > > Ok, this does strongly suggest that it is the outb() operation that I > > > > > suspected after all, I just sent you a wrong patch to test, failing > > > > > to realize that alpha has two implementations of outb, and that the > > > > > extern one is the one that gets used in a defconfig build. > > > > > > > > > > Could you try again with this patch added in? (Sorry for the > > > > > whitespace > > > > > damage, you'll have to apply it by hand). Presumably a wmb() > > > > > is sufficient here, but I'm trying to play safe here by restoring the > > > > > barrier that was part of outb() before it broke. > > > > > > > > > > Arnd > > > > > > > > This patch fixes both hangs. > > > > > > Ok, thanks for confirming. Now the question is whether this is only needed > > > for I/O space writes to ensure that an outb() etc completes before we > > > start > > > the next instruction as in my first theory above, or if the readl() > > > getting moved > > > ahead of a prior writel() as Maciej suggested could also happen for > > > memory space. > > > > > > My guess would be that even on something as weakly ordered as Alpha, > > > the PCI semantics are kept that a load from a non-prefetchable PCI MMIO > > > space can not get moved ahead of a preceding store to the same PCI > > > device from the same CPU, but I don't really know enough about alpha. > > > > > > Arnd > > > > It's hard to tell. The Alpha manual says that only overlapping accesses > > are ordered. > > > > I did some tests on framebuffer and found out that "read+read+write+write" > > is faster than "read+write+read+write" - that may suggest that the reads > > flush the write queue. > > Do you know if the framebuffer BAR you are using is non-prefetchable? (you > can find out from lspci) > > Ordering rule only applies to non-prefetchable BARs only. Architectures are > allowed to do whatever they want for for prefetchable BARs. Alpha doesn't differentiate between prefetchable and non-prefetchable memory. ioremap_wc is the same as ioremap_uc on alpha. Alpha seems to do write queuing on the framebuffer but there's no read queuing. Mikulas