On Thu, Aug 24, 2023, at 23:07, Weili Qian wrote: > On 2023/8/21 22:36, Ard Biesheuvel wrote: >> On Mon, 21 Aug 2023 at 14:45, Weili Qian <qianweili@xxxxxxxxxx> wrote: >>> : "Q" (*((char *)buffer)) >> >> This constraint describes the first byte of buffer, which might cause >> problems because the asm reads the entire buffer not just the first >> byte. > I don't understand why constraint describes the first byte of > buffer,and the compilation result seems to be ok. > > 1811 1afc: a9400a61 ldp x1, x2, [x19] > 1812 1b00: a9000801 stp x1, x2, [x0] > 1813 1b04: d50332bf dmb oshst > Maybe I'm wrong about 'Q', could you explain it or where can I learn > more about this? The "Q" is not the problem here, it's the cast to (char *), which tells the compiler that only the first byte is used here, and allows it to not actually store the rest of the buffer into memory. It's not a problem on the __iomem pointer side, since gcc never accesses those directly, and for the version taking a __u128 literal or two __u64 registers it is also ok. >>> unsigned long tmp0 = 0, tmp1 = 0; >>> >>> asm volatile("ldp %0, %1, %3\n" >>> "stp %0, %1, %2\n" >>> "dmb oshst\n" >> >> Is this the right barrier for a read? > Should be "dmb oshld\n". As I said, this needs to be __io_ar(), which might be defined in a different way. >> >> Have you tried using __uint128_t accesses instead? >> >> E.g., something like >> >> static void qm_write128(void __iomem *addr, const void *buffer) >> { >> volatile __uint128_t *dst = (volatile __uint128_t __force *)addr; >> const __uint128_t *src __aligned(1) = buffer; >> >> WRITE_ONCE(*dst, *src); >> dma_wmb(); >> } >> >> should produce the right instruction sequence, and works on all >> architectures that have CONFIG_ARCH_SUPPORTS_INT128=y >> > > I tried this, but WRITE_ONCE does not support type __uint128_t. > ->WRITE_ONCE > ->compiletime_assert_rwonce_type > ->compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ > "Unsupported access size for {READ,WRITE}_ONCE().") On top of that, WRITE_ONCE() does not guarantee atomicity, and dma_wmb() might not be the correct barrier. > So can we define generic IO helpers based on patchset > https://lore.kernel.org/all/20180124090519.6680-4-ynorov@xxxxxxxxxxxxxxxxxx/ > Part of the implementation is as follows: > > add writeo() in include/asm-generic/io.h > > #ifdef CONFIG_ARCH_SUPPORTS_INT128 > #ifndef writeo > #define writeo writeo > static inline void writeo(__uint128_t value, volatile void __iomem > *addr) > { > __io_bw(); > __raw_writeo((__uint128_t __force)__cpu_to_le128(value), addr); > //__cpu_to_le128 will implement. > __io_aw(); > } > #endif > #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ Right, this is fairly close to what we need. The 'o' notation might be slightly controversial, which is why I suggested definining only iowrite128() to avoid having to agree on the correct letter for 16-byte stores. > in arch/arm64/include/asm/io.h > > #ifdef CONFIG_ARCH_SUPPORTS_INT128 > #define __raw_writeo __raw_writeo > static __always_inline void __raw_writeo(__uint128_t val, volatile > void __iomem *addr) > { > u64 hi, lo; > > lo = (u64)val; > hi = (u64)(val >> 64); > > asm volatile ("stp %x0, %x1, [%2]\n" :: "rZ"(lo), "rZ"(hi), "r"(addr)); > } > #endif /* CONFIG_ARCH_SUPPORTS_INT128 */ This definition looks fine. > And add io{read|write}128bits in include/linux/io-64-nonatomic-{hi-lo,lo-hi}.h. > static inline void lo_hi_writeo(__uint128_t val, volatile void __iomem *addr) > { > writeq(val, addr); > writeq(val >> 64, addr); > } This also looks fine. Arnd