Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 01, 2023 at 02:15:15PM +0900, Linus Torvalds wrote:
> On Fri, 1 Dec 2023 at 12:36, Guo Ren <guoren@xxxxxxxxxx> wrote:
> >
> > I modify your code to guarantee the progress of the comparison failure
> > situation:
> 
> Are you sure you want to prefetch when the value doesn't even match
> the existing value? Aren't you better off just looping doing just
> reads until you at least have a valid value to exchange?
Oops, you are right, I'm wrong here. Here is what I want to say:
+          "       prefetch %2\n"                          \
           "0:     lr.w %0, %2\n"                          \
           "       bne  %0, %z3, 1f\n"                     \
           "       sc.w %1, %z4, %2\n"                     \
           "       beqz %1, 1f\n"                          \
           "       prefetchw %2\n"                         \
           "       j 0b\n"                                 \
           "1:\n"                                          \

Just add a prefetch shared cache line before, which could stick the
shared cache line several cycles to ensure the outer cmpxchg loop could
make progress.

All we wrote here is not for actual code, and it's just a reference for
hardware guys.
 - lr could imply a sticky shared cache line.
 - sc could imply a sticky unique cache line when sc fails.

> 
> Otherwise you might easily find that your cmpxchg loops cause
> horrendous cacheline ping-pong patterns.
> 
> Of course, if your hardware is bad at releasing the written state,
> that may actually be what you want, to see changes in a timely manner.
> 
> At least some of our cmpxchg uses are the "try_cmpxchg()" pattern,
> which wouldn't even loop - and won't write at all - on a value
> mismatch.
> 
> And some of those try_cmpxchg cases are a lot more important than the
> lockref code. Things like spin_trylock() etc. Of course, for best
> results you might want to have an actual architecture-specific helper
> for the try_cmpxchg case, and use the compiler for "outputs in
> condition codes" (but then you need to have fallback cases for older
> compilers that don't support it).
> 
> See the code code for example of the kinds of nasty support code you need with
> 
>   /*
>    * Macros to generate condition code outputs from inline assembly,
>    * The output operand must be type "bool".
>    */
>   #ifdef __GCC_ASM_FLAG_OUTPUTS__
>   # define CC_SET(c) "\n\t/* output condition code " #c "*/\n"
>   # define CC_OUT(c) "=@cc" #c
>   #else
>   # define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]\n"
>   # define CC_OUT(c) [_cc_ ## c] "=qm"
>   #endif
> 
> and then a lot of "CC_SET()/CC_OUT()" use in the inline asms in
> <asm/cmpxchg.h>...
Thanks for the tip. It's helpful to try_cmpxchg optimization.

> 
> IOW, you really should time this and then add the timing information
> to whatever commit message.
> 
>              Linus
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux