Re: [PATCH v2 cmpxchg 12/13] sh: Emulate one-byte cmpxchg

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 2 May 2024 14:18:48 -0700

On Thu, May 02, 2024 at 09:53:45PM +0100, Al Viro wrote:
> On Thu, May 02, 2024 at 06:33:49AM -0700, Paul E. McKenney wrote:
> 
> > Understood, and this sort of compatibility consideration is why this
> > version of this patchset does not emulate two-byte (16-bit) cmpxchg()
> > operations.  The original (RFC) series did emulate these, which does
> > not work on a few architectures that do not provide 16-bit load/store
> > instructions, hence no 16-bit support in this series.
> > 
> > So this one-byte-only series affects only Alpha systems lacking
> > single-byte load/store instructions.  If I understand correctly, Alpha
> > 21164A (EV56) and later *do* have single-byte load/store instructions,
> > and thus are still just fine.  In fact, it looks like EV56 also has
> > two-byte load/store instructions, and so would have been OK with
> > the original one-/two-byte RFC series.
> 
> Wait a sec.  On Alpha we already implement 16bit and 8bit xchg and cmpxchg.
> See arch/alpha/include/asm/xchg.h:
> static inline unsigned long
> ____cmpxchg(_u16, volatile short *m, unsigned short old, unsigned short new)
> {
>        unsigned long prev, tmp, cmp, addr64;
> 
>        __asm__ __volatile__(
>        "       andnot  %5,7,%4\n"
>        "       inswl   %1,%5,%1\n"
>        "1:     ldq_l   %2,0(%4)\n"
>        "       extwl   %2,%5,%0\n"
>        "       cmpeq   %0,%6,%3\n"
>        "       beq     %3,2f\n"
>        "       mskwl   %2,%5,%2\n"
>        "       or      %1,%2,%2\n"
>        "       stq_c   %2,0(%4)\n"
>        "       beq     %2,3f\n"
>        "2:\n"
>        ".subsection 2\n"
>        "3:     br      1b\n"
>        ".previous"
>        : "=&r" (prev), "=&r" (new), "=&r" (tmp), "=&r" (cmp), "=&r" (addr64)
>        : "r" ((long)m), "Ir" (old), "1" (new) : "memory");
> 
>        return prev;
> }
> 
> Load-locked and store-conditional are done on 64bit value, with
> 16bit operations done in registers.  This is what 16bit store
> (assignment to unsigned short *) turns into with
>         stw $17,0($16)		// *(u16*)r16 = r17
> and without -mbwx
>         insql $17,$16,$17	// r17 = r17 << (8 * (r16 & 7))
>         ldq_u $1,0($16)		// r1 = *(u64 *)(r16 & ~7)
> 	mskwl $1,$16,$1		// r1 &= ~(0xffff << (8 * (r16 & 7))
> 	bis $17,$1,$17		// r17 |= r1
> 	stq_u $17,0($16)	// *(u64 *)(r16 & ~7) = r17
> 
> What's more, load-locked/store-conditional doesn't have 16bit and 8bit
> variants on any Alphas - it's always 32bit (ldl_l) or 64bit (ldq_l).
> 
> What BWX adds is load/store byte/word, load/store byte/word unaligned
> and sign-extend byte/word.  IOW, it's absolutely irrelevant for
> cmpxchg (or xchg) purposes.

If you are only ever doing atomic read-modify-write operations on the
byte in question, then agreed, you don't care about byte loads and stores.

But there are use cases that do mix smp_store_release() with cmpxchg(),
and those use cases won't work unless at least byte store is implemented.
Or I suppose that we could use cmpxchg() instead of smp_store_release(),
but that is wasteful for architectures that do support byte stores.

So EV56 adds the byte loads and stores needed for those use cases.

Or am I missing your point?

							Thanx, Paul