Hi all, This series adds API for 128-bit memory IO access and enables it for ARM64. The original motivation for 128-bit API came from new Cavium network device driver. The hardware requires 128-bit access to make things work. See description in patch 3 for details. Also, starting from ARMv8.4, stp and ldp instructions become atomic, and API for 128-bit access would be helpful in core arm64 code. This series is RFC. I'd like to collect opinions on idea and implementation details. * I didn't implement all 128-bit operations existing for 64-bit variables and other types (__swab128p etc). Do we need them all right now, or we can add them when actually needed? * u128 name is already used in crypto code. So here I use __uint128_t that comes from GCC for 128-bit types. Should I rename existing type in crypto and make core code for 128-bit variables consistent with u64, u32 etc? (I think yes, but would like to ask crypto people for it.) * Some compilers don't support __uint128_t, so I protected all generic code with config option HAVE_128BIT_ACCESS. I think it's OK, but... * For 128-bit read/write functions I take suffix 'o', which means read/write the octet of bytes. Is this name OK? * my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I don't have other BE setup on hand, so BE case is formally not tested. BE code for arm64 is looking well though. With all that, this example code: static int __init 128bit_test(void) { __uint128_t v; __uint128_t addr; __uint128_t val = (__uint128_t) 0x1234567890abc; val |= ((__uint128_t) 0xdeadbeaf) << 64; writeo(val, &addr); v = reado(&addr); pr_err("%llx%llx\n", (u64) (val >> 64), (u64) val); pr_err("%llx%llx\n", (u64) (v >> 64), (u64) v); return v != val; } Generates this listing for arm64-le: 0000000000000000 <128bit_test>: 0: a9bb7bfd stp x29, x30, [sp, #-80]! 4: 910003fd mov x29, sp 8: a90153f3 stp x19, x20, [sp, #16] c: a9025bf5 stp x21, x22, [sp, #32] 10: f9001bf7 str x23, [sp, #48] 14: d5033e9f dsb st 18: d2815797 mov x23, #0xabc // #2748 1c: d297d5f6 mov x22, #0xbeaf // #48815 20: f2acf137 movk x23, #0x6789, lsl #16 24: f2bbd5b6 movk x22, #0xdead, lsl #16 28: f2c468b7 movk x23, #0x2345, lsl #32 2c: f2e00037 movk x23, #0x1, lsl #48 30: a9045bb7 stp x23, x22, [x29, #64] 34: a94453b3 ldp x19, x20, [x29, #64] 38: d5033d9f dsb ld 3c: 90000015 adrp x21, 0 <128bit_test> 40: 910002b5 add x21, x21, #0x0 44: aa1703e2 mov x2, x23 48: aa1603e1 mov x1, x22 4c: aa1503e0 mov x0, x21 50: 94000000 bl 0 <printk> 54: aa1303e2 mov x2, x19 58: aa1403e1 mov x1, x20 5c: ca170273 eor x19, x19, x23 60: ca160294 eor x20, x20, x22 64: aa1503e0 mov x0, x21 68: aa140273 orr x19, x19, x20 6c: 94000000 bl 0 <printk> 70: f9401bf7 ldr x23, [sp, #48] 74: f100027f cmp x19, #0x0 78: a94153f3 ldp x19, x20, [sp, #16] 7c: 1a9f07e0 cset w0, ne // ne = any 80: a9425bf5 ldp x21, x22, [sp, #32] 84: a8c57bfd ldp x29, x30, [sp], #80 88: d65f03c0 ret And for arm64-be: 0000000000000000 <128bit_test>: 0: a9bb7bfd stp x29, x30, [sp, #-80]! 4: 910003fd mov x29, sp 8: a90153f3 stp x19, x20, [sp, #16] c: a9025bf5 stp x21, x22, [sp, #32] 10: f9001bf7 str x23, [sp, #48] 14: d5033e9f dsb st 18: d2802001 mov x1, #0x100 // #256 1c: d2d5bbc0 mov x0, #0xadde00000000 // #191168994344960 20: f2a8a461 movk x1, #0x4523, lsl #16 24: f2f5f7c0 movk x0, #0xafbe, lsl #48 28: f2d12ce1 movk x1, #0x8967, lsl #32 2c: f2f78141 movk x1, #0xbc0a, lsl #48 30: a90407a0 stp x0, x1, [x29, #64] 34: a94453b3 ldp x19, x20, [x29, #64] 38: dac00e73 rev x19, x19 3c: dac00e94 rev x20, x20 40: d5033d9f dsb ld 44: d2815796 mov x22, #0xabc // #2748 48: 90000015 adrp x21, 0 <128bit_test> 4c: f2acf136 movk x22, #0x6789, lsl #16 50: 910002b5 add x21, x21, #0x0 54: f2c468b6 movk x22, #0x2345, lsl #32 58: d297d5f7 mov x23, #0xbeaf // #48815 5c: f2e00036 movk x22, #0x1, lsl #48 60: f2bbd5b7 movk x23, #0xdead, lsl #16 64: aa1603e2 mov x2, x22 68: aa1703e1 mov x1, x23 6c: aa1503e0 mov x0, x21 70: 94000000 bl 0 <printk> 74: aa1403e2 mov x2, x20 78: aa1303e1 mov x1, x19 7c: ca160294 eor x20, x20, x22 80: ca170273 eor x19, x19, x23 84: aa1503e0 mov x0, x21 88: aa140273 orr x19, x19, x20 8c: 94000000 bl 0 <printk> 90: f9401bf7 ldr x23, [sp, #48] 94: f100027f cmp x19, #0x0 98: a94153f3 ldp x19, x20, [sp, #16] 9c: 1a9f07e0 cset w0, ne // ne = any a0: a9425bf5 ldp x21, x22, [sp, #32] a4: a8c57bfd ldp x29, x30, [sp], #80 a8: d65f03c0 ret I tested LE kernel with this, and it works OK for me. BE version adds few extra instructions to swap bytes, but generated code looks reasonable. We can avoid byteswapping, if not needed, by using __raw_reado() and __raw_writeo(). Yury Norov (3): UAPI: Introduce 128-bit types and byteswap operations asm-generic/io.h: API for 128-bit I/O accessors arm64: enable 128-bit memory read/write support arch/Kconfig | 7 ++ arch/arm64/include/asm/io.h | 31 ++++++ include/asm-generic/io.h | 147 +++++++++++++++++++++++++++ include/linux/byteorder/generic.h | 4 + include/uapi/asm-generic/int-ll64.h | 8 ++ include/uapi/linux/byteorder/big_endian.h | 2 + include/uapi/linux/byteorder/little_endian.h | 4 + include/uapi/linux/swab.h | 22 ++++ include/uapi/linux/types.h | 4 + 9 files changed, 229 insertions(+) -- 2.11.0