On Fri, Dec 01, 2023 at 02:15:15PM +0900, Linus Torvalds wrote: > On Fri, 1 Dec 2023 at 12:36, Guo Ren <guoren@xxxxxxxxxx> wrote: > > > > I modify your code to guarantee the progress of the comparison failure > > situation: > > Are you sure you want to prefetch when the value doesn't even match > the existing value? Aren't you better off just looping doing just > reads until you at least have a valid value to exchange? Oops, you are right, I'm wrong here. Here is what I want to say: + " prefetch %2\n" \ "0: lr.w %0, %2\n" \ " bne %0, %z3, 1f\n" \ " sc.w %1, %z4, %2\n" \ " beqz %1, 1f\n" \ " prefetchw %2\n" \ " j 0b\n" \ "1:\n" \ Just add a prefetch shared cache line before, which could stick the shared cache line several cycles to ensure the outer cmpxchg loop could make progress. All we wrote here is not for actual code, and it's just a reference for hardware guys. - lr could imply a sticky shared cache line. - sc could imply a sticky unique cache line when sc fails. > > Otherwise you might easily find that your cmpxchg loops cause > horrendous cacheline ping-pong patterns. > > Of course, if your hardware is bad at releasing the written state, > that may actually be what you want, to see changes in a timely manner. > > At least some of our cmpxchg uses are the "try_cmpxchg()" pattern, > which wouldn't even loop - and won't write at all - on a value > mismatch. > > And some of those try_cmpxchg cases are a lot more important than the > lockref code. Things like spin_trylock() etc. Of course, for best > results you might want to have an actual architecture-specific helper > for the try_cmpxchg case, and use the compiler for "outputs in > condition codes" (but then you need to have fallback cases for older > compilers that don't support it). > > See the code code for example of the kinds of nasty support code you need with > > /* > * Macros to generate condition code outputs from inline assembly, > * The output operand must be type "bool". > */ > #ifdef __GCC_ASM_FLAG_OUTPUTS__ > # define CC_SET(c) "\n\t/* output condition code " #c "*/\n" > # define CC_OUT(c) "=@cc" #c > #else > # define CC_SET(c) "\n\tset" #c " %[_cc_" #c "]\n" > # define CC_OUT(c) [_cc_ ## c] "=qm" > #endif > > and then a lot of "CC_SET()/CC_OUT()" use in the inline asms in > <asm/cmpxchg.h>... Thanks for the tip. It's helpful to try_cmpxchg optimization. > > IOW, you really should time this and then add the timing information > to whatever commit message. > > Linus >