On Wed, Apr 05, 2023 at 09:37:04AM -0700, Dave Hansen wrote: > On 4/5/23 07:17, Uros Bizjak wrote: > > Add generic and target specific support for local{,64}_try_cmpxchg > > and wire up support for all targets that use local_t infrastructure. > > I feel like I'm missing some context. > > What are the actual end user visible effects of this series? Is there a > measurable decrease in perf overhead? Why go to all this trouble for > perf? Who else will use local_try_cmpxchg()? Overall, the theory is that it can generate slightly better code (e.g. by reusing the flags on x86). In practice, that might be in the noise, but as demonstrated in prior postings the code generation is no worse than before. >From my perspective, the more important part is that this aligns local_t with the other atomic*_t APIs, which all have ${atomictype}_try_cmpxchg(), and for consistency/legibility/maintainability it's nice to be able to use the same code patterns, e.g. ${inttype} new, old = ${atomictype}_read(ptr); do { ... new = do_something_with(old); } while (${atomictype}_try_cmpxvhg(ptr, &oldval, newval); > I'm all for improving things, and perf is an important user. But, if > the goal here is improving performance, it would be nice to see at least > a stab at quantifying the performance delta. IIUC, Steve's original request for local_try_cmpxchg() was a combination of a theoretical performance benefit and a more general preference to use try_cmpxchg() for consistency / better structure of the source code: https://lore.kernel.org/lkml/20230301131831.6c8d4ff5@xxxxxxxxxxxxxxxxxx/ I agree it'd be nice to have performance figures, but I think those would only need to demonstrate a lack of a regression rather than a performance improvement, and I think it's fairly clear from eyeballing the generated instructions that a regression isn't likely. Thanks, Mark.