Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 22 Nov 2023 at 09:52, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Still not actually tested, but the code generation on x86 looks
> reasonable, so it migth be worth looking at whether it helps the
> RISC-V case.

Doing some more munging, and actually looking at RISC-V code
generation too (I obviously had to enable ARCH_USE_CMPXCHG_LOCKREF for
RISC-V).

I get this:

  lockref_get:
        addi    sp,sp,-32
        sd      s0,16(sp)
        sd      s1,8(sp)
        sd      ra,24(sp)
        addi    s0,sp,32
        li      a1,65536
        ld      a5,0(a0)
        mv      s1,a0
        addi    a1,a1,-1
        li      a0,100
  .L43:
        sext.w  a3,a5
        li      a4,1
        srliw   a2,a5,16
        and     a3,a3,a1
        slli    a4,a4,32
        bne     a2,a3,.L49
        add     a4,a5,a4
  0:
        lr.d a3, 0(s1)
        bne a3, a5, 1f
        sc.d.rl a2, a4, 0(s1)
        bnez a2, 0b
        fence rw, rw
  1:
        bne     a5,a3,.L52
        ld      ra,24(sp)
        ld      s0,16(sp)
        ld      s1,8(sp)
        addi    sp,sp,32
        jr      ra
  ...

so now that single update is indeed just one single instruction:

        add     a4,a5,a4

is that "increment count in the high 32 bits".

The ticket lock being unlocked checks are those

        li      a1,65536
        sext.w  a3,a5
        srliw   a2,a5,16
        and     a3,a3,a1
        bne     a2,a3,.L49

instructions if I read it right.

That actually looks fairly close to optimal, although the frame setup
is kind of sad.

(The above does not include the "loop if the cmpxchg failed" part of
the code generation)

Anyway, apart from enabling LOCKREF, the patch to get this for RISC-V
is attached.

I'm not going to play with this any more, but you might want to check
whether this actually does work on RISC-V.

Becaue I only looked at the code generation, I didn't actually look at
whether it *worked*.

                Linus
From 168f35850c15468941e597907e33daacd179d54a Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed, 22 Nov 2023 09:33:29 -0800
Subject: [PATCH] lockref: improve code generation for ref updates

Our lockref data structure is two 32-bit words laid out next to each
other, combining the spinlock and the count into one entity that can be
accessed atomically together.

In particular, the structure is laid out so that the count is the upper
32 bit word (on little-endian), so that you can do basic arithmetic on
the count in 64 bits: instead of adding one to the 32-bit word, you can
just add a value shifted by 32 to the full 64-bit word.

Sadly, neither gcc nor clang are quite clever enough to work that out on
their own, so this does that "manually".

Also, try to do any compares against zero values, which generally
improves the code generation.  So rather than check that the value was
at least 1 before a decrement, check that it's positive or zero after
the decrement.  We don't worry about the overflow point in lockrefs.

Cc: Guo Ren <guoren@xxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
---
 lib/lockref.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/lib/lockref.c b/lib/lockref.c
index 2afe4c5d8919..f3c30c538af1 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -26,6 +26,17 @@
 	}									\
 } while (0)
 
+/*
+ * The compiler isn't smart enough to the the count
+ * increment in the high 32 bits of the 64-bit value,
+ * so do this optimization by hand.
+ */
+#if defined(__LITTLE_ENDIAN) && BITS_PER_LONG == 64
+ #define LOCKREF_ADD(n,x) ((n).lock_count += (unsigned long)(x)<<32)
+#else
+ #define LOCKREF_ADD(n,x) ((n).count += (unsigned long)(x)<<32)
+#endif
+
 #else
 
 #define CMPXCHG_LOOP(CODE, SUCCESS) do { } while (0)
@@ -42,7 +53,7 @@
 void lockref_get(struct lockref *lockref)
 {
 	CMPXCHG_LOOP(
-		new.count++;
+		LOCKREF_ADD(new,1);
 	,
 		return;
 	);
@@ -63,9 +74,9 @@ int lockref_get_not_zero(struct lockref *lockref)
 	int retval;
 
 	CMPXCHG_LOOP(
-		new.count++;
 		if (old.count <= 0)
 			return 0;
+		LOCKREF_ADD(new,1);
 	,
 		return 1;
 	);
@@ -91,8 +102,8 @@ int lockref_put_not_zero(struct lockref *lockref)
 	int retval;
 
 	CMPXCHG_LOOP(
-		new.count--;
-		if (old.count <= 1)
+		LOCKREF_ADD(new,-1);
+		if (new.count <= 0)
 			return 0;
 	,
 		return 1;
@@ -119,8 +130,8 @@ EXPORT_SYMBOL(lockref_put_not_zero);
 int lockref_put_return(struct lockref *lockref)
 {
 	CMPXCHG_LOOP(
-		new.count--;
-		if (old.count <= 0)
+		LOCKREF_ADD(new,-1);
+		if (new.count < 0)
 			return -1;
 	,
 		return new.count;
@@ -137,8 +148,8 @@ EXPORT_SYMBOL(lockref_put_return);
 int lockref_put_or_lock(struct lockref *lockref)
 {
 	CMPXCHG_LOOP(
-		new.count--;
-		if (old.count <= 1)
+		LOCKREF_ADD(new,-1);
+		if (new.count <= 0)
 			break;
 	,
 		return 1;
@@ -174,9 +185,9 @@ int lockref_get_not_dead(struct lockref *lockref)
 	int retval;
 
 	CMPXCHG_LOOP(
-		new.count++;
 		if (old.count < 0)
 			return 0;
+		LOCKREF_ADD(new,1);
 	,
 		return 1;
 	);
-- 
2.43.0.5.g38fb137bdb


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux