On Thu, May 02, 2024 at 09:53:45PM +0100, Al Viro wrote: > On Thu, May 02, 2024 at 06:33:49AM -0700, Paul E. McKenney wrote: > > > Understood, and this sort of compatibility consideration is why this > > version of this patchset does not emulate two-byte (16-bit) cmpxchg() > > operations. The original (RFC) series did emulate these, which does > > not work on a few architectures that do not provide 16-bit load/store > > instructions, hence no 16-bit support in this series. > > > > So this one-byte-only series affects only Alpha systems lacking > > single-byte load/store instructions. If I understand correctly, Alpha > > 21164A (EV56) and later *do* have single-byte load/store instructions, > > and thus are still just fine. In fact, it looks like EV56 also has > > two-byte load/store instructions, and so would have been OK with > > the original one-/two-byte RFC series. > > Wait a sec. On Alpha we already implement 16bit and 8bit xchg and cmpxchg. > See arch/alpha/include/asm/xchg.h: > static inline unsigned long > ____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new) > { > unsigned long prev, tmp, cmp, addr64; > > __asm__ __volatile__( > " andnot %5,7,%4\n" > " inswl %1,%5,%1\n" > "1: ldq_l %2,0(%4)\n" > " extwl %2,%5,%0\n" > " cmpeq %0,%6,%3\n" > " beq %3,2f\n" > " mskwl %2,%5,%2\n" > " or %1,%2,%2\n" > " stq_c %2,0(%4)\n" > " beq %2,3f\n" > "2:\n" > ".subsection 2\n" > "3: br 1b\n" > ".previous" > : "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64) > : "r" ((long)m), "Ir" (old), "1" (new) : "memory"); > > return prev; > } > > Load-locked and store-conditional are done on 64bit value, with > 16bit operations done in registers. This is what 16bit store > (assignment to unsigned short *) turns into with > stw $17,0($16) // *(u16*)r16 = r17 > and without -mbwx > insql $17,$16,$17 // r17 = r17 << (8 * (r16 & 7)) > ldq_u $1,0($16) // r1 = *(u64 *)(r16 & ~7) > mskwl $1,$16,$1 // r1 &= ~(0xffff << (8 * (r16 & 7)) > bis $17,$1,$17 // r17 |= r1 > stq_u $17,0($16) // *(u64 *)(r16 & ~7) = r17 > > What's more, load-locked/store-conditional doesn't have 16bit and 8bit > variants on any Alphas - it's always 32bit (ldl_l) or 64bit (ldq_l). > > What BWX adds is load/store byte/word, load/store byte/word unaligned > and sign-extend byte/word. IOW, it's absolutely irrelevant for > cmpxchg (or xchg) purposes. If you are only ever doing atomic read-modify-write operations on the byte in question, then agreed, you don't care about byte loads and stores. But there are use cases that do mix smp_store_release() with cmpxchg(), and those use cases won't work unless at least byte store is implemented. Or I suppose that we could use cmpxchg() instead of smp_store_release(), but that is wasteful for architectures that do support byte stores. So EV56 adds the byte loads and stores needed for those use cases. Or am I missing your point? Thanx, Paul