On Tue, Apr 6, 2021 at 10:56 AM Stafford Horne <shorne@xxxxxxxxx> wrote: > On Tue, Apr 06, 2021 at 11:50:38AM +0800, Guo Ren wrote: > > On Wed, Mar 31, 2021 at 3:23 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > > On Wed, Mar 31, 2021 at 12:35 AM Stafford Horne <shorne@xxxxxxxxx> wrote: > > > > We shouldn't export xchg16/cmpxchg16(emulated by lr.w/sc.w) in riscv, > > We should forbid these sub-word atomic primitive and lets the > > programmers consider their atomic design. > > Fair enough, having the generic sub-word emulation would be something that > an architecture can select to use/export. I still have the feeling that we should generalize and unify the exact behavior across architectures as much as possible here, while possibly also trying to simplify the interface a little. Looking through the various xchg()/cmpxchg() implementations, I find eight distinct ways to do 8-bit and 16-bit atomics: Full support: ia64, m68k (Atari only), x86, arm32 (v6k+), arm64 gcc/clang __sync_{val,bool}_compare_and_swap: s390 Emulated through ll/sc: alpha, powerpc Emulated through cmpxchg loop: mips, openrisc, xtensa (xchg but not cmpxchg), sparc64 (cmpxchg_u8, xchg_u16 but not cmpxchg_u16 and xchg_u8!) Emulated through local_irq_save (non SMP only): h8300, m68k (most), microblaze, mips, nds32, nios2 Emulated through hashed spinlock: parisc (8-bit only added in 2020, 16-bit still missing) Forced compile-time error: arm32 (v4/v5/v6 non-SMP), arc, csky, riscv, parisc (16 bit), sparc32, sparc64, xtensa (cmpxchg) Silently broken: hexagon Since there are really only a handful of instances in the kernel that use the cmpxchg() or xchg() on u8/u16 variables, it would seem best to just disallow those completely and have a separate set of functions here, with only 64-bit architectures using any variable-type wrapper to handle both 32-bit and 64-bit arguments. Interestingly, the s390 version using __sync_val_compare_and_swap() seems to produce nice output on all architectures that have atomic instructions, with any supported compiler, to the point where I think we could just use that to replace most of the inline-asm versions except for arm64: #define cmpxchg(ptr, o, n) \ ({ \ __typeof__(*(ptr)) __o = (o); \ __typeof__(*(ptr)) __n = (n); \ (__typeof__(*(ptr))) __sync_val_compare_and_swap((ptr),__o,__n);\ }) Not how gcc's acquire/release behavior of __sync_val_compare_and_swap() relates to what the kernel wants here. The gcc documentation also recommends using the standard __atomic_compare_exchange_n() builtin instead, which would allow constructing release/acquire/relaxed versions as well, but I could not get it to produce equally good output. (possibly I was using it wrong) Arnd