[PATCH 0/3] RISC-V: use generic spinlock and rwlock

Michael Clark <michaeljclark@xxxxxxx> · Mon, 11 Feb 2019 17:38:26 +1300

Use fair spinlocks and rwlocks on RISC-V.

Investigated use of ticket spinlocks for RISC-V so that we have fair
spinlocks under contention. After making generic changes, found that
queue spinlocks require atomic operations on small words (RISC-V only
supports LR/SC on 32-bit or 64-bit words); so this series borrows
support for small word atomics from the MIPS port, updates the RISC-V
port to use the generic spinlocks and rwlocks, and finally fixes a bug
found during visual inspection of the MIPS small word atomics support. 

The queue spinlocks and rwlocks are in asm-generic, so this series 
reduces platform specific code in the RISC-V port, besides adding small
word atomics support, which expands generic support for atomics and is
presumably useful elsewhere.

The patch series has been tested successfully with SMP in RISC-V QEMU
using the riscv-linux-4.20 branch: https://github.com/riscv/riscv-linux
and applies cleanly to torvalds/master.

Note: acquire or release semantics are passed through to the underlying
cmpxchg implementation for the minimum atomic operation word size (32b).
The aligned larger word load used to fetch and mask the previous value
of the word surrounding the small word for the atomic operation, is
performed relaxed before the larger word atomic cmpxchg operation. One
assumes the MIPS code has been battle tested however the RISC-V Linux
memory model has additional ordering constraints for acquire/release.

_relaxed_: the aligned large word load is relaxed, so this is okay.

_acquire_: the aligned large word load is encompassed by "fence r,rw"
acquire barrier _following_ the compare and swap operation, thus is
correctly before the acquire barrier, and locally is a syntactic
dependency for the compare and swap operation thus is correctly ordered.

_release_: the aligned large word load occurs before the "fence rw,w"
_preceeding_ the compare and swap, thus it is technically a load
before write barrier, and the fence implies additional ordering of
the load before the compare and swap. This adds additional ordering
for the first loop iteration. It is a load, and a depdendent load and
thus does not require any additional ordering. In this case, ordering
could be relaxed by performed the aligned large word load after the
barrier preceeding the compare and swap, however, this would require a
special variant of the cmpxchg asm. The operation is not invalid, rather
the release fence adds additional explicit ordering for the aligned
large word load that is technically not required. This may show up as
an additional LR/SC loop iteration under contention due to non optimal
fence placement.

QEMU on x86 is not representative of real hardware and is likely more
tolerant than weakly ordered hardware. Further testing is advised,
ideally on real hardware or an agressive OoO simulator that has been
verified against the RISC-V Memory Model.

Michael Clark (3):
  RISC-V: implement xchg_small and cmpxchg_small for char and short
  RISC-V: convert custom spinlock/rwlock to generic qspinlock/qrwlock
  MIPS: fix truncation in __cmpxchg_small for short values

 arch/mips/kernel/cmpxchg.c              |   2 +-
 arch/riscv/Kconfig                      |   2 +
 arch/riscv/include/asm/cmpxchg.h        |  54 +++++++++
 arch/riscv/include/asm/mcs_spinlock.h   |   7 ++
 arch/riscv/include/asm/qrwlock.h        |   8 ++
 arch/riscv/include/asm/qspinlock.h      |   8 ++
 arch/riscv/include/asm/spinlock.h       | 141 +-----------------------
 arch/riscv/include/asm/spinlock_types.h |  33 +-----
 arch/riscv/kernel/Makefile              |   1 +
 arch/riscv/kernel/cmpxchg.c             | 118 ++++++++++++++++++++
 10 files changed, 206 insertions(+), 168 deletions(-)
 create mode 100644 arch/riscv/include/asm/mcs_spinlock.h
 create mode 100644 arch/riscv/include/asm/qrwlock.h
 create mode 100644 arch/riscv/include/asm/qspinlock.h
 create mode 100644 arch/riscv/kernel/cmpxchg.c

-- 
2.17.1