On Sat, Feb 25, 2023 at 05:50:58PM +0100, Peter Zijlstra wrote: > On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote: > > Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for > > fast block concurrent (atomic) 128-bit accesses. > > > > The used lpq instruction requires 128-bit alignment. This is also the > > reason why the compiler doesn't emit this instruction if __READ_ONCE() is > > used for 128-bit accesses. > > Does your u128 not have natural alignment? Does it help if you force > align the u128 type? s390 seems to be the only architecture which has a 64 bit alignment for __uint128_t. But making it explicitly naturally aligned doesn't help. I guess that's because the lpq instruction requires an even-odd register pair where it reads to, while the now used lmg instruction can use any register pair; but lmg doesn't come with atomic semantics.