Use fair spinlocks and rwlocks on RISC-V. Investigated use of ticket spinlocks for RISC-V so that we have fair spinlocks under contention. After making generic changes, found that queue spinlocks require atomic operations on small words (RISC-V only supports LR/SC on 32-bit or 64-bit words); so this series borrows support for small word atomics from the MIPS port, updates the RISC-V port to use the generic spinlocks and rwlocks, and finally fixes a bug found during visual inspection of the MIPS small word atomics support. The queue spinlocks and rwlocks are in asm-generic, so this series reduces platform specific code in the RISC-V port, besides adding small word atomics support, which expands generic support for atomics and is presumably useful elsewhere. The patch series has been tested successfully with SMP in RISC-V QEMU using the riscv-linux-4.20 branch: https://github.com/riscv/riscv-linux and applies cleanly to torvalds/master. Note: acquire or release semantics are passed through to the underlying cmpxchg implementation for the minimum atomic operation word size (32b). The aligned larger word load used to fetch and mask the previous value of the word surrounding the small word for the atomic operation, is performed relaxed before the larger word atomic cmpxchg operation. One assumes the MIPS code has been battle tested however the RISC-V Linux memory model has additional ordering constraints for acquire/release. _relaxed_: the aligned large word load is relaxed, so this is okay. _acquire_: the aligned large word load is encompassed by "fence r,rw" acquire barrier _following_ the compare and swap operation, thus is correctly before the acquire barrier, and locally is a syntactic dependency for the compare and swap operation thus is correctly ordered. _release_: the aligned large word load occurs before the "fence rw,w" _preceeding_ the compare and swap, thus it is technically a load before write barrier, and the fence implies additional ordering of the load before the compare and swap. This adds additional ordering for the first loop iteration. It is a load, and a depdendent load and thus does not require any additional ordering. In this case, ordering could be relaxed by performed the aligned large word load after the barrier preceeding the compare and swap, however, this would require a special variant of the cmpxchg asm. The operation is not invalid, rather the release fence adds additional explicit ordering for the aligned large word load that is technically not required. This may show up as an additional LR/SC loop iteration under contention due to non optimal fence placement. QEMU on x86 is not representative of real hardware and is likely more tolerant than weakly ordered hardware. Further testing is advised, ideally on real hardware or an agressive OoO simulator that has been verified against the RISC-V Memory Model. Michael Clark (3): RISC-V: implement xchg_small and cmpxchg_small for char and short RISC-V: convert custom spinlock/rwlock to generic qspinlock/qrwlock MIPS: fix truncation in __cmpxchg_small for short values arch/mips/kernel/cmpxchg.c | 2 +- arch/riscv/Kconfig | 2 + arch/riscv/include/asm/cmpxchg.h | 54 +++++++++ arch/riscv/include/asm/mcs_spinlock.h | 7 ++ arch/riscv/include/asm/qrwlock.h | 8 ++ arch/riscv/include/asm/qspinlock.h | 8 ++ arch/riscv/include/asm/spinlock.h | 141 +----------------------- arch/riscv/include/asm/spinlock_types.h | 33 +----- arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/cmpxchg.c | 118 ++++++++++++++++++++ 10 files changed, 206 insertions(+), 168 deletions(-) create mode 100644 arch/riscv/include/asm/mcs_spinlock.h create mode 100644 arch/riscv/include/asm/qrwlock.h create mode 100644 arch/riscv/include/asm/qspinlock.h create mode 100644 arch/riscv/kernel/cmpxchg.c -- 2.17.1