On Fri, 1 Dec 2023 at 12:36, Guo Ren <guoren@xxxxxxxxxx> wrote: > > I modify your code to guarantee the progress of the comparison failure > situation: Are you sure you want to prefetch when the value doesn't even match the existing value? Aren't you better off just looping doing just reads until you at least have a valid value to exchange? Otherwise you might easily find that your cmpxchg loops cause horrendous cacheline ping-pong patterns. Of course, if your hardware is bad at releasing the written state, that may actually be what you want, to see changes in a timely manner. At least some of our cmpxchg uses are the "try_cmpxchg()" pattern, which wouldn't even loop - and won't write at all - on a value mismatch. And some of those try_cmpxchg cases are a lot more important than the lockref code. Things like spin_trylock() etc. Of course, for best results you might want to have an actual architecture-specific helper for the try_cmpxchg case, and use the compiler for "outputs in condition codes" (but then you need to have fallback cases for older compilers that don't support it). See the code code for example of the kinds of nasty support code you need with /* * Macros to generate condition code outputs from inline assembly, * The output operand must be type "bool". */ #ifdef __GCC_ASM_FLAG_OUTPUTS__ # define CC_SET(c) "\n\t/* output condition code " #c "*/\n" # define CC_OUT(c) "=@cc" #c #else # define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]\n" # define CC_OUT(c) [_cc_ ## c] "=qm" #endif and then a lot of "CC_SET()/CC_OUT()" use in the inline asms in <asm/cmpxchg.h>... IOW, you really should time this and then add the timing information to whatever commit message. Linus