On 2023/8/30 3:37, Arnd Bergmann wrote: > On Thu, Aug 24, 2023, at 23:07, Weili Qian wrote: >> On 2023/8/21 22:36, Ard Biesheuvel wrote: >>> On Mon, 21 Aug 2023 at 14:45, Weili Qian <qianweili@xxxxxxxxxx> wrote: > >>>> : "Q" (*((char *)buffer)) >>> >>> This constraint describes the first byte of buffer, which might cause >>> problems because the asm reads the entire buffer not just the first >>> byte. >> I don't understand why constraint describes the first byte of >> buffer,and the compilation result seems to be ok. >> >> 1811 1afc: a9400a61 ldp x1, x2, [x19] >> 1812 1b00: a9000801 stp x1, x2, [x0] >> 1813 1b04: d50332bf dmb oshst >> Maybe I'm wrong about 'Q', could you explain it or where can I learn >> more about this? > > The "Q" is not the problem here, it's the cast to (char *), which > tells the compiler that only the first byte is used here, and > allows it to not actually store the rest of the buffer into > memory. > > It's not a problem on the __iomem pointer side, since gcc never > accesses those directly, and for the version taking a __u128 literal > or two __u64 registers it is also ok. Thanks for your reply, I have got it. > >>>> unsigned long tmp0 = 0, tmp1 = 0; >>>> >>>> asm volatile("ldp %0, %1, %3\n" >>>> "stp %0, %1, %2\n" >>>> "dmb oshst\n" >>> >>> Is this the right barrier for a read? >> Should be "dmb oshld\n". > > As I said, this needs to be __io_ar(), which might be > defined in a different way. > >>> >>> Have you tried using __uint128_t accesses instead? >>> >>> E.g., something like >>> >>> static void qm_write128(void __iomem *addr, const void *buffer) >>> { >>> volatile __uint128_t *dst = (volatile __uint128_t __force *)addr; >>> const __uint128_t *src __aligned(1) = buffer; >>> >>> WRITE_ONCE(*dst, *src); >>> dma_wmb(); >>> } >>> >>> should produce the right instruction sequence, and works on all >>> architectures that have CONFIG_ARCH_SUPPORTS_INT128=y >>> >> >> I tried this, but WRITE_ONCE does not support type __uint128_t. >> ->WRITE_ONCE >> ->compiletime_assert_rwonce_type >> ->compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ >> "Unsupported access size for {READ,WRITE}_ONCE().") > > On top of that, WRITE_ONCE() does not guarantee atomicity, and > dma_wmb() might not be the correct barrier. > >> So can we define generic IO helpers based on patchset >> https://lore.kernel.org/all/20180124090519.6680-4-ynorov@xxxxxxxxxxxxxxxxxx/ >> Part of the implementation is as follows: >> >> add writeo() in include/asm-generic/io.h >> >> #ifdef CONFIG_ARCH_SUPPORTS_INT128 >> #ifndef writeo >> #define writeo writeo >> static inline void writeo(__uint128_t value, volatile void __iomem >> *addr) >> { >> __io_bw(); >> __raw_writeo((__uint128_t __force)__cpu_to_le128(value), addr); >> //__cpu_to_le128 will implement. >> __io_aw(); >> } >> #endif >> #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ > > Right, this is fairly close to what we need. The 'o' notation > might be slightly controversial, which is why I suggested > definining only iowrite128() to avoid having to agree on > the correct letter for 16-byte stores. Okay, I'll just define iowrite128() and ioread128(). Thanks, Weili > >> in arch/arm64/include/asm/io.h >> >> #ifdef CONFIG_ARCH_SUPPORTS_INT128 >> #define __raw_writeo __raw_writeo >> static __always_inline void __raw_writeo(__uint128_t val, volatile >> void __iomem *addr) >> { >> u64 hi, lo; >> >> lo = (u64)val; >> hi = (u64)(val >> 64); >> >> asm volatile ("stp %x0, %x1, [%2]\n" :: "rZ"(lo), "rZ"(hi), "r"(addr)); >> } >> #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ > > This definition looks fine. > >> And add io{read|write}128bits in include/linux/io-64-nonatomic-{hi-lo,lo-hi}.h. >> static inline void lo_hi_writeo(__uint128_t val, volatile void __iomem *addr) >> { >> writeq(val, addr); >> writeq(val >> 64, addr); >> } > > This also looks fine. > > Arnd > . >