Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 22, 2023 at 11:11:38AM -0800, Linus Torvalds wrote:
> On Wed, 22 Nov 2023 at 09:52, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Still not actually tested, but the code generation on x86 looks
> > reasonable, so it migth be worth looking at whether it helps the
> > RISC-V case.
> 
> Doing some more munging, and actually looking at RISC-V code
> generation too (I obviously had to enable ARCH_USE_CMPXCHG_LOCKREF for
> RISC-V).
> 
> I get this:
> 
>   lockref_get:
>         addi    sp,sp,-32
>         sd      s0,16(sp)
>         sd      s1,8(sp)
>         sd      ra,24(sp)
>         addi    s0,sp,32
>         li      a1,65536
>         ld      a5,0(a0)
>         mv      s1,a0
>         addi    a1,a1,-1
>         li      a0,100
>   .L43:
>         sext.w  a3,a5
>         li      a4,1
>         srliw   a2,a5,16
>         and     a3,a3,a1
>         slli    a4,a4,32
>         bne     a2,a3,.L49
>         add     a4,a5,a4
>   0:
>         lr.d a3, 0(s1)
>         bne a3, a5, 1f
>         sc.d.rl a2, a4, 0(s1)
>         bnez a2, 0b
>         fence rw, rw
>   1:
>         bne     a5,a3,.L52
>         ld      ra,24(sp)
>         ld      s0,16(sp)
>         ld      s1,8(sp)
>         addi    sp,sp,32
>         jr      ra
>   ...
> 
> so now that single update is indeed just one single instruction:
> 
>         add     a4,a5,a4
> 
> is that "increment count in the high 32 bits".
> 
> The ticket lock being unlocked checks are those
> 
>         li      a1,65536
>         sext.w  a3,a5
>         srliw   a2,a5,16
>         and     a3,a3,a1
>         bne     a2,a3,.L49
> 
> instructions if I read it right.
> 
> That actually looks fairly close to optimal, although the frame setup
> is kind of sad.
> 
> (The above does not include the "loop if the cmpxchg failed" part of
> the code generation)
> 
> Anyway, apart from enabling LOCKREF, the patch to get this for RISC-V
> is attached.
> 
> I'm not going to play with this any more, but you might want to check
> whether this actually does work on RISC-V.
> 
> Becaue I only looked at the code generation, I didn't actually look at
> whether it *worked*.
> 
>                 Linus

> From 168f35850c15468941e597907e33daacd179d54a Mon Sep 17 00:00:00 2001
> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date: Wed, 22 Nov 2023 09:33:29 -0800
> Subject: [PATCH] lockref: improve code generation for ref updates
> 
> Our lockref data structure is two 32-bit words laid out next to each
> other, combining the spinlock and the count into one entity that can be
> accessed atomically together.
> 
> In particular, the structure is laid out so that the count is the upper
> 32 bit word (on little-endian), so that you can do basic arithmetic on
> the count in 64 bits: instead of adding one to the 32-bit word, you can
> just add a value shifted by 32 to the full 64-bit word.
> 
> Sadly, neither gcc nor clang are quite clever enough to work that out on
> their own, so this does that "manually".
> 
> Also, try to do any compares against zero values, which generally
> improves the code generation.  So rather than check that the value was
> at least 1 before a decrement, check that it's positive or zero after
> the decrement.  We don't worry about the overflow point in lockrefs.
Tested-by: Guo Ren <guoren@xxxxxxxxxx>

This patch could help riscv optimize 3 ALU instructions.

Before the patch:
000000000000020c <lockref_get>:
        CMPXCHG_LOOP(
 20c:   611c                    ld      a5,0(a0)

000000000000020e <.LBB492>:
 20e:   03079713                sll     a4,a5,0x30
 212:   0107d69b                srlw    a3,a5,0x10
 216:   9341                    srl     a4,a4,0x30
 218:   02e69663                bne     a3,a4,244 <.L40>

000000000000021c <.LBB494>:
 21c:   4207d693                sra     a3,a5,0x20    -------+
 220:   02079713                sll     a4,a5,0x20	     |
 224:   2685                    addw    a3,a3,1		     |
 226:   1682                    sll     a3,a3,0x20	     |
 228:   9301                    srl     a4,a4,0x20	     |
 22a:   8f55                    or      a4,a4,a3      -------+

000000000000022c <.L0^B4>:
 22c:   100536af                lr.d    a3,(a0)
 230:   00f69763                bne     a3,a5,23e <.L1^B5>
 234:   1ae5362f                sc.d.rl a2,a4,(a0)
 238:   fa75                    bnez    a2,22c <.L0^B4>
 23a:   0330000f                fence   rw,rw

After the patch:
000000000000020c <lockref_get>:
        CMPXCHG_LOOP(
 20c:   611c                    ld      a5,0(a0)

000000000000020e <.LBB526>:
 20e:   03079713                sll     a4,a5,0x30
 212:   0107d69b                srlw    a3,a5,0x10
 216:   9341                    srl     a4,a4,0x30
 218:   02e69163                bne     a3,a4,23a <.L40>

000000000000021c <.LBB528>:
 21c:   4705                    li      a4,1		------+
 21e:   1702                    sll     a4,a4,0x20	      |
 220:   973e                    add     a4,a4,a5	------+

0000000000000222 <.L0^B4>:
 222:   100536af                lr.d    a3,(a0)
 226:   00f69763                bne     a3,a5,234 <.L1^B5>
 22a:   1ae5362f                sc.d.rl a2,a4,(a0)
 22e:   fa75                    bnez    a2,222 <.L0^B4>
 230:   0330000f                fence   rw,rw

> 
> Cc: Guo Ren <guoren@xxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> ---
>  lib/lockref.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/lockref.c b/lib/lockref.c
> index 2afe4c5d8919..f3c30c538af1 100644
> --- a/lib/lockref.c
> +++ b/lib/lockref.c
> @@ -26,6 +26,17 @@
>  	}									\
>  } while (0)
>  
> +/*
> + * The compiler isn't smart enough to the the count
> + * increment in the high 32 bits of the 64-bit value,
> + * so do this optimization by hand.
> + */
> +#if defined(__LITTLE_ENDIAN) && BITS_PER_LONG == 64
> + #define LOCKREF_ADD(n,x) ((n).lock_count += (unsigned long)(x)<<32)
> +#else
> + #define LOCKREF_ADD(n,x) ((n).count += (unsigned long)(x)<<32)
> +#endif
> +
>  #else
>  
>  #define CMPXCHG_LOOP(CODE, SUCCESS) do { } while (0)
> @@ -42,7 +53,7 @@
>  void lockref_get(struct lockref *lockref)
>  {
>  	CMPXCHG_LOOP(
> -		new.count++;
> +		LOCKREF_ADD(new,1);
>  	,
>  		return;
>  	);
> @@ -63,9 +74,9 @@ int lockref_get_not_zero(struct lockref *lockref)
>  	int retval;
>  
>  	CMPXCHG_LOOP(
> -		new.count++;
>  		if (old.count <= 0)
>  			return 0;
> +		LOCKREF_ADD(new,1);
>  	,
>  		return 1;
>  	);
> @@ -91,8 +102,8 @@ int lockref_put_not_zero(struct lockref *lockref)
>  	int retval;
>  
>  	CMPXCHG_LOOP(
> -		new.count--;
> -		if (old.count <= 1)
> +		LOCKREF_ADD(new,-1);
> +		if (new.count <= 0)
>  			return 0;
>  	,
>  		return 1;
> @@ -119,8 +130,8 @@ EXPORT_SYMBOL(lockref_put_not_zero);
>  int lockref_put_return(struct lockref *lockref)
>  {
>  	CMPXCHG_LOOP(
> -		new.count--;
> -		if (old.count <= 0)
> +		LOCKREF_ADD(new,-1);
> +		if (new.count < 0)
>  			return -1;
>  	,
>  		return new.count;
> @@ -137,8 +148,8 @@ EXPORT_SYMBOL(lockref_put_return);
>  int lockref_put_or_lock(struct lockref *lockref)
>  {
>  	CMPXCHG_LOOP(
> -		new.count--;
> -		if (old.count <= 1)
> +		LOCKREF_ADD(new,-1);
> +		if (new.count <= 0)
>  			break;
>  	,
>  		return 1;
> @@ -174,9 +185,9 @@ int lockref_get_not_dead(struct lockref *lockref)
>  	int retval;
>  
>  	CMPXCHG_LOOP(
> -		new.count++;
>  		if (old.count < 0)
>  			return 0;
> +		LOCKREF_ADD(new,1);
>  	,
>  		return 1;
>  	);
> -- 
> 2.43.0.5.g38fb137bdb
> 





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux