On Wed, 22 Nov 2023 at 09:52, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Still not actually tested, but the code generation on x86 looks > reasonable, so it migth be worth looking at whether it helps the > RISC-V case. Doing some more munging, and actually looking at RISC-V code generation too (I obviously had to enable ARCH_USE_CMPXCHG_LOCKREF for RISC-V). I get this: lockref_get: addi sp,sp,-32 sd s0,16(sp) sd s1,8(sp) sd ra,24(sp) addi s0,sp,32 li a1,65536 ld a5,0(a0) mv s1,a0 addi a1,a1,-1 li a0,100 .L43: sext.w a3,a5 li a4,1 srliw a2,a5,16 and a3,a3,a1 slli a4,a4,32 bne a2,a3,.L49 add a4,a5,a4 0: lr.d a3, 0(s1) bne a3, a5, 1f sc.d.rl a2, a4, 0(s1) bnez a2, 0b fence rw, rw 1: bne a5,a3,.L52 ld ra,24(sp) ld s0,16(sp) ld s1,8(sp) addi sp,sp,32 jr ra ... so now that single update is indeed just one single instruction: add a4,a5,a4 is that "increment count in the high 32 bits". The ticket lock being unlocked checks are those li a1,65536 sext.w a3,a5 srliw a2,a5,16 and a3,a3,a1 bne a2,a3,.L49 instructions if I read it right. That actually looks fairly close to optimal, although the frame setup is kind of sad. (The above does not include the "loop if the cmpxchg failed" part of the code generation) Anyway, apart from enabling LOCKREF, the patch to get this for RISC-V is attached. I'm not going to play with this any more, but you might want to check whether this actually does work on RISC-V. Becaue I only looked at the code generation, I didn't actually look at whether it *worked*. Linus
From 168f35850c15468941e597907e33daacd179d54a Mon Sep 17 00:00:00 2001 From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Wed, 22 Nov 2023 09:33:29 -0800 Subject: [PATCH] lockref: improve code generation for ref updates Our lockref data structure is two 32-bit words laid out next to each other, combining the spinlock and the count into one entity that can be accessed atomically together. In particular, the structure is laid out so that the count is the upper 32 bit word (on little-endian), so that you can do basic arithmetic on the count in 64 bits: instead of adding one to the 32-bit word, you can just add a value shifted by 32 to the full 64-bit word. Sadly, neither gcc nor clang are quite clever enough to work that out on their own, so this does that "manually". Also, try to do any compares against zero values, which generally improves the code generation. So rather than check that the value was at least 1 before a decrement, check that it's positive or zero after the decrement. We don't worry about the overflow point in lockrefs. Cc: Guo Ren <guoren@xxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> --- lib/lockref.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/lib/lockref.c b/lib/lockref.c index 2afe4c5d8919..f3c30c538af1 100644 --- a/lib/lockref.c +++ b/lib/lockref.c @@ -26,6 +26,17 @@ } \ } while (0) +/* + * The compiler isn't smart enough to the the count + * increment in the high 32 bits of the 64-bit value, + * so do this optimization by hand. + */ +#if defined(__LITTLE_ENDIAN) && BITS_PER_LONG == 64 + #define LOCKREF_ADD(n,x) ((n).lock_count += (unsigned long)(x)<<32) +#else + #define LOCKREF_ADD(n,x) ((n).count += (unsigned long)(x)<<32) +#endif + #else #define CMPXCHG_LOOP(CODE, SUCCESS) do { } while (0) @@ -42,7 +53,7 @@ void lockref_get(struct lockref *lockref) { CMPXCHG_LOOP( - new.count++; + LOCKREF_ADD(new,1); , return; ); @@ -63,9 +74,9 @@ int lockref_get_not_zero(struct lockref *lockref) int retval; CMPXCHG_LOOP( - new.count++; if (old.count <= 0) return 0; + LOCKREF_ADD(new,1); , return 1; ); @@ -91,8 +102,8 @@ int lockref_put_not_zero(struct lockref *lockref) int retval; CMPXCHG_LOOP( - new.count--; - if (old.count <= 1) + LOCKREF_ADD(new,-1); + if (new.count <= 0) return 0; , return 1; @@ -119,8 +130,8 @@ EXPORT_SYMBOL(lockref_put_not_zero); int lockref_put_return(struct lockref *lockref) { CMPXCHG_LOOP( - new.count--; - if (old.count <= 0) + LOCKREF_ADD(new,-1); + if (new.count < 0) return -1; , return new.count; @@ -137,8 +148,8 @@ EXPORT_SYMBOL(lockref_put_return); int lockref_put_or_lock(struct lockref *lockref) { CMPXCHG_LOOP( - new.count--; - if (old.count <= 1) + LOCKREF_ADD(new,-1); + if (new.count <= 0) break; , return 1; @@ -174,9 +185,9 @@ int lockref_get_not_dead(struct lockref *lockref) int retval; CMPXCHG_LOOP( - new.count++; if (old.count < 0) return 0; + LOCKREF_ADD(new,1); , return 1; ); -- 2.43.0.5.g38fb137bdb