On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > From: Mikulas Patocka > > > Sent: 08 August 2018 14:47 > > ... > > > The problem on ARM is that I see data corruption when the overlapping > > > unaligned writes are done just by a single core. > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > same physical locations) or an aligned write followed by an > > unaligned one that updates part of the earlier write. > > (Or the opposite order?) > > In the memcpy() case, there can be a sequence of unaligned writes but > they would not modify the same byte (so no overlapping address at the > byte level). They do modify the same byte, but with the same value. Suppose that you want to copy a piece of data that is between 8 and 16 bytes long. You can do this: add src_end, src, len add dst_end, dst, len ldr x0, [src] ldr x1, [src_end - 8] str x0, [dst] str x1, [dst_end - 8] The ARM64 memcpy uses this trick heavily in order to reduce branching, and this is what makes the PCIe controller choke. Mikulas