On Mon, 21 Aug 2023 at 14:45, Weili Qian <qianweili@xxxxxxxxxx> wrote: > > > > On 2023/8/21 18:26, Will Deacon wrote: > > On Mon, Aug 21, 2023 at 04:19:54PM +0800, Herbert Xu wrote: > >> On Sat, Aug 19, 2023 at 09:33:18AM +0200, Ard Biesheuvel wrote: > >>> > >>> No, that otx2_write128() routine looks buggy, actually, The ! at the > >>> end means writeback, and so the register holding addr will be > >>> modified, which is not reflect in the asm constraints. It also lacks a > >>> barrier. > >> > >> OK. But at least having a helper called write128 looks a lot > >> cleaner than just having unexplained assembly in the code. > > > > I guess we want something similar to how writeq() is handled on 32-bit > > architectures (see include/linux/io-64-nonatomic-{hi-lo,lo-hi}.h. > > > > It's then CPU-dependent on whether you get atomicity. > > > > Will > > . > > > > Thanks for the review. > > Since the HiSilicon accelerator devices are supported only on the ARM64 platform, > the following 128bit read and write interfaces are added to the driver, is this OK? No, this does not belong in the driver. As Will is suggesting, we should define some generic helpers for this, and provide an arm64 implementation if the generic one does not result in the correct codegen. > > #if defined(CONFIG_ARM64) > static void qm_write128(void __iomem *addr, const void *buffer) > { > unsigned long tmp0 = 0, tmp1 = 0; > > asm volatile("ldp %0, %1, %3\n" > "stp %0, %1, %2\n" > "dmb oshst\n" > : "=&r" (tmp0), > "=&r" (tmp1), > "+Q" (*((char __iomem *)addr)) This constraint describes the first byte of addr, which is sloppy but probably works fine. > : "Q" (*((char *)buffer)) This constraint describes the first byte of buffer, which might cause problems because the asm reads the entire buffer not just the first byte. > : "memory"); > } > > static void qm_read128(void *buffer, const void __iomem *addr) > { > unsigned long tmp0 = 0, tmp1 = 0; > > asm volatile("ldp %0, %1, %3\n" > "stp %0, %1, %2\n" > "dmb oshst\n" Is this the right barrier for a read? > : "=&r" (tmp0), > "=&r" (tmp1), > "+Q" (*((char *)buffer)) > : "Q" (*((char __iomem *)addr)) > : "memory"); > } > > #else > static void qm_write128(void __iomem *addr, const void *buffer) > { > > } > > static void qm_read128(void *buffer, const void __iomem *addr) > { > > } > #endif > Have you tried using __uint128_t accesses instead? E.g., something like static void qm_write128(void __iomem *addr, const void *buffer) { volatile __uint128_t *dst = (volatile __uint128_t __force *)addr; const __uint128_t *src __aligned(1) = buffer; WRITE_ONCE(*dst, *src); dma_wmb(); } should produce the right instruction sequence, and works on all architectures that have CONFIG_ARCH_SUPPORTS_INT128=y This needs a big fat comment describing that the MMIO access needs to be atomic, but we could consider it as a fallback if we decide not to bother with special MMIO accessors, although I suspect we'll be needing them more widely at some point anyway.