On 2023/8/21 22:36, Ard Biesheuvel wrote: > On Mon, 21 Aug 2023 at 14:45, Weili Qian <qianweili@xxxxxxxxxx> wrote: >> >> >> >> On 2023/8/21 18:26, Will Deacon wrote: >>> On Mon, Aug 21, 2023 at 04:19:54PM +0800, Herbert Xu wrote: >>>> On Sat, Aug 19, 2023 at 09:33:18AM +0200, Ard Biesheuvel wrote: >>>>> >>>>> No, that otx2_write128() routine looks buggy, actually, The ! at the >>>>> end means writeback, and so the register holding addr will be >>>>> modified, which is not reflect in the asm constraints. It also lacks a >>>>> barrier. >>>> >>>> OK. But at least having a helper called write128 looks a lot >>>> cleaner than just having unexplained assembly in the code. >>> >>> I guess we want something similar to how writeq() is handled on 32-bit >>> architectures (see include/linux/io-64-nonatomic-{hi-lo,lo-hi}.h. >>> >>> It's then CPU-dependent on whether you get atomicity. >>> >>> Will >>> . >>> >> >> Thanks for the review. >> >> Since the HiSilicon accelerator devices are supported only on the ARM64 platform, >> the following 128bit read and write interfaces are added to the driver, is this OK? > > No, this does not belong in the driver. As Will is suggesting, we > should define some generic helpers for this, and provide an arm64 > implementation if the generic one does not result in the correct > codegen. > Sorry, I misunderstood here. >> >> #if defined(CONFIG_ARM64) >> static void qm_write128(void __iomem *addr, const void *buffer) >> { >> unsigned long tmp0 = 0, tmp1 = 0; >> >> asm volatile("ldp %0, %1, %3\n" >> "stp %0, %1, %2\n" >> "dmb oshst\n" >> : "=&r" (tmp0), >> "=&r" (tmp1), >> "+Q" (*((char __iomem *)addr)) > > This constraint describes the first byte of addr, which is sloppy but > probably works fine. > >> : "Q" (*((char *)buffer)) > > This constraint describes the first byte of buffer, which might cause > problems because the asm reads the entire buffer not just the first > byte. I don't understand why constraint describes the first byte of buffer,and the compilation result seems to be ok. 1811 1afc: a9400a61 ldp x1, x2, [x19] 1812 1b00: a9000801 stp x1, x2, [x0] 1813 1b04: d50332bf dmb oshst Maybe I'm wrong about 'Q', could you explain it or where can I learn more about this? > >> : "memory"); >> } >> >> static void qm_read128(void *buffer, const void __iomem *addr) >> { >> unsigned long tmp0 = 0, tmp1 = 0; >> >> asm volatile("ldp %0, %1, %3\n" >> "stp %0, %1, %2\n" >> "dmb oshst\n" > > Is this the right barrier for a read? Should be "dmb oshld\n". > >> : "=&r" (tmp0), >> "=&r" (tmp1), >> "+Q" (*((char *)buffer)) >> : "Q" (*((char __iomem *)addr)) >> : "memory"); >> } >> >> #else >> static void qm_write128(void __iomem *addr, const void *buffer) >> { >> >> } >> >> static void qm_read128(void *buffer, const void __iomem *addr) >> { >> >> } >> #endif >> > > Have you tried using __uint128_t accesses instead? > > E.g., something like > > static void qm_write128(void __iomem *addr, const void *buffer) > { > volatile __uint128_t *dst = (volatile __uint128_t __force *)addr; > const __uint128_t *src __aligned(1) = buffer; > > WRITE_ONCE(*dst, *src); > dma_wmb(); > } > > should produce the right instruction sequence, and works on all > architectures that have CONFIG_ARCH_SUPPORTS_INT128=y > I tried this, but WRITE_ONCE does not support type __uint128_t. ->WRITE_ONCE ->compiletime_assert_rwonce_type ->compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ "Unsupported access size for {READ,WRITE}_ONCE().") So can we define generic IO helpers based on patchset https://lore.kernel.org/all/20180124090519.6680-4-ynorov@xxxxxxxxxxxxxxxxxx/ Part of the implementation is as follows: add writeo() in include/asm-generic/io.h #ifdef CONFIG_ARCH_SUPPORTS_INT128 #ifndef writeo #define writeo writeo static inline void writeo(__uint128_t value, volatile void __iomem *addr) { __io_bw(); __raw_writeo((__uint128_t __force)__cpu_to_le128(value), addr); //__cpu_to_le128 will implement. __io_aw(); } #endif #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ #ifdef CONFIG_ARCH_SUPPORTS_INT128 #ifndef iowrite128 #define iowrite128 iowrite128 static inline void iowrite128(__uint128_t value, volatile void __iomem *addr) { writeo(value, addr); } #endif #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ in arch/arm64/include/asm/io.h #ifdef CONFIG_ARCH_SUPPORTS_INT128 #define __raw_writeo __raw_writeo static __always_inline void __raw_writeo(__uint128_t val, volatile void __iomem *addr) { u64 hi, lo; lo = (u64)val; hi = (u64)(val >> 64); asm volatile ("stp %x0, %x1, [%2]\n" :: "rZ"(lo), "rZ"(hi), "r"(addr)); } #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ And add io{read|write}128bits in include/linux/io-64-nonatomic-{hi-lo,lo-hi}.h. static inline void lo_hi_writeo(__uint128_t val, volatile void __iomem *addr) { writeq(val, addr); writeq(val >> 64, addr); } #ifndef writeo #define writeo lo_hi_writeo #endif static inline void hi_lo_writeo(__uint128_t val, volatile void __iomem *addr) { writeq(val >> 64, addr); writeq(val, addr); } #ifndef writeo #define writeo hi_lo_writeo #endif If this is ok, I'm going to implement reado and writeo, then submit the patchset. Thanks, Weili > This needs a big fat comment describing that the MMIO access needs to > be atomic, but we could consider it as a fallback if we decide not to > bother with special MMIO accessors, although I suspect we'll be > needing them more widely at some point anyway. > . >