On Sun, Feb 26, 2023 at 09:56:44PM +0100, Heiko Carstens wrote: > On Sat, Feb 25, 2023 at 05:50:58PM +0100, Peter Zijlstra wrote: > > On Fri, Feb 24, 2023 at 11:02:36AM +0100, Heiko Carstens wrote: > > > Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for > > > fast block concurrent (atomic) 128-bit accesses. > > > > > > The used lpq instruction requires 128-bit alignment. This is also the > > > reason why the compiler doesn't emit this instruction if __READ_ONCE() is > > > used for 128-bit accesses. > > > > Does your u128 not have natural alignment? Does it help if you force > > align the u128 type? > > s390 seems to be the only architecture which has a 64 bit alignment for > __uint128_t. But making it explicitly naturally aligned doesn't help. > I guess that's because the lpq instruction requires an even-odd register > pair where it reads to, while the now used lmg instruction can use any > register pair; but lmg doesn't come with atomic semantics. One thing you could do it talk with your compiler folks to allow using lpq for volatile loads. That won't help you now and you'll have to do these patches, but it makes sense to change the toolchains to me.