On Fri, Nov 24, 2023 at 12:58:11PM +0000, Robin Murphy wrote: > > diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h > > index 3b694511b98f..73ab91913790 100644 > > --- a/arch/arm64/include/asm/io.h > > +++ b/arch/arm64/include/asm/io.h > > @@ -135,6 +135,26 @@ extern void __memset_io(volatile void __iomem *, int, size_t); > > #define memcpy_fromio(a,c,l) __memcpy_fromio((a),(c),(l)) > > #define memcpy_toio(c,a,l) __memcpy_toio((c),(a),(l)) > > +static inline void __memcpy_toio_64(volatile void __iomem *to, const void *from) > > +{ > > + const u64 *from64 = from; > > + > > + /* > > + * Newer ARM core have sensitive write combining buffers, it is > > + * important that the stores be contiguous blocks of store instructions. > > + * Normal memcpy does not work reliably. > > + */ > > + asm volatile("stp %x0, %x1, [%8, #16 * 0]\n" > > + "stp %x2, %x3, [%8, #16 * 1]\n" > > + "stp %x4, %x5, [%8, #16 * 2]\n" > > + "stp %x6, %x7, [%8, #16 * 3]\n" > > + : > > + : "rZ"(from64[0]), "rZ"(from64[1]), "rZ"(from64[2]), > > + "rZ"(from64[3]), "rZ"(from64[4]), "rZ"(from64[5]), > > + "rZ"(from64[6]), "rZ"(from64[7]), "r"(to)); > > Is this correct for big-endian? LDP/STP are kinda tricksy in that regard. Uh.. I didn't think about it at all.. By no means do I have any skill reading the ARM documents, but I think it is OK, it says: Mem[address, dbytes, AccType_NORMAL] = data1; Mem[address+dbytes, dbytes, AccType_NORMAL] = data2; So I understand that as Mem[%8, #16 * 0, 8, AccType_NORMAL] = from64[0] Mem[%8, #16 * 0 + 1 , 8, AccType_NORMAL] = from64[1] Mem[%8, #16 * 1, 8, AccType_NORMAL] = from64[2] Mem[%8, #16 * 1 + 1, 8, AccType_NORMAL] = from64[3] .. Which is the same on BE/LE? But I don't know the pitfall to watch for here. This is memcpy so we don't have to swap, the order of the bits in the register doesn't matter. Thanks, Jason