On Thu, 2 May 2024 at 14:01, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > +static inline unsigned long > +____xchg_u8(volatile char *m, unsigned long val) > +{ > + unsigned long ret, tmp, addr64; > + > + __asm__ __volatile__( > + " andnot %4,7,%3\n" > + " insbl %1,%4,%1\n" > + "1: ldq_l %2,0(%3)\n" > + " extbl %2,%4,%0\n" > + " mskbl %2,%4,%2\n" > + " or %1,%2,%2\n" > + " stq_c %2,0(%3)\n" > + " beq %2,2f\n" > + ".subsection 2\n" > + "2: br 1b\n" > + ".previous" > + : "=&r" (ret), "=&r" (val), "=&r" (tmp), "=&r" (addr64) > + : "r" ((long)m), "1" (val) : "memory"); > + > + return ret; > +} Side note: if you move this around, I think you should just uninline it too and turn it into a function call. This inline asm doesn't actually take any advantage of the inlining. The main reason to inline something like this is that you could then deal with different compile-time alignments better than using the generic software sequence. But that's not what the inline asm actually does, and it uses the worst-case code sequence for inserting the byte. Put that together with "byte and word xchg are rare", and it really smells to me like we shouldn't be inlining this. Now, the 32-bit and 64-bit cases are different - more common, but also much simpler code sequences. They seem worth inlining. That said, maybe for alpha, the "just move code around" is better than "fix up old bad decisions" just because the effort is lower. Linus