On Sun, 26 Nov 2023 at 08:39, Guo Ren <guoren@xxxxxxxxxx> wrote: > > Here is my optimization advice: > > #define CMPXCHG_LOOP(CODE, SUCCESS) do { \ > int retry = 100; \ > struct lockref old; \ > BUILD_BUG_ON(sizeof(old) != 8); \ > + prefetchw(lockref); \\ No. We're not adding software prefetches to generic code. Been there, done that. They *never* improve performance on good hardware. They end up helping on some random (usually particularly bad) microarchitecture, and then they hurt everybody else. And the real optimization advice is: "don't run on crap hardware". It really is that simple. Good hardware does OoO and sees the future write. > Micro-arch could give prefetchw more guarantee: Well, in practice, they never do, and in fact they are often buggy and cause problems because they weren't actually tested very much. Linus