On Wed, 8 Aug 2018, David Laight wrote: > From: Catalin Marinas > > Sent: 08 August 2018 13:17 > ... > > I think hazarding is what goes wrong here, especially since with > > overlapping unaligned addresses. However, I disagree that it is > > impossible to implement this properly on a platform with PCIe so that > > Normal NC mappings can be used. > > I've been trying to follow this discussion... > > Is the problem just that reads don't snoop/flush the write-combining buffer? No. The pixel corruption is permanently visible on the monitor (even if there are no reads from the framebuffer at all). So it can't be explained as mishandling read-after-write hazard. > Aligned writes that end on an appropriate boundary will leave the write > combining buffer empty. > But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. > > ISTR even x86 requires a fence instruction in some sequence associated > with write-combining writes. Other x86 cores may observe wc writes out of order - but a single x86 core is self-consistent - i.e. if you do movl $0x00000000, (%ebx) movl $0xFFFFFFFF, 3(%ebx) then the byte at ebx+3 will always contain 0xFF. The core can't just corrupt data while doing reordering. The problem on ARM is that I see data corruption when the overlapping unaligned writes are done just by a single core. > David Mikulas