On Tue, Oct 22, 2024 at 12:53:01PM -0700, Dave Hansen wrote: > On 10/17/24 11:18, Peter Zijlstra wrote: > > On Wed, Oct 16, 2024 at 12:44:18PM -0700, Dave Hansen wrote: ... > >> Would anybody hate if we broke this up a bit, like: > >> > >> const typeof(var) _val = val; > >> const int paoconst = __builtin_constant_p(val); > >> const int paoinc = paoconst && ((_val) == 1); > >> const int paodec = paoconst && ((_val) == (typeof(var))-1); > >> > >> and then did > >> > >> if (paoinc) > >> percpu_unary_op(size, qual, "inc", var); > >> ... > > I think that is an overall improvement. Proceed! 🙂 > > I poked at this a bit: Thanks for looking into this! > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=30e0899c6ab7fe1134e4b96db963f0be89b1dd5a > > I believe it functions fine. But it surprised me with a few things. > Here's one. I assumed that doing an add((unsigned)-1) would be rare. > It's not. It's actually pretty common because this: > > #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) > > ends up causing problems when 'pcp' is an unsigned type. For example, > in this chain: > > mem_cgroup_exit -> > obj_cgroup_put -> > percpu_ref_put -> > percpu_ref_put_many(ref, 1) -> > this_cpu_sub > > the compiler can see the '1' constant. It effectively expands to: > > this_cpu_add(pcp, -(unsigned long)(1)) > > With the old code, gcc manages to generate a 'dec'. Clang generates an > 'add'. With my hack above both compilers generate an 'add'. This > actually matters in some code that seems potentially rather performance > sensitive: > > add/remove: 0/0 grow/shrink: 219/9 up/down: 755/-141 (614) > Function old new delta > flush_end_io 905 1070 +165 > x86_pmu_cancel_txn 242 338 +96 > lru_add 554 594 +40 > mlock_folio_batch 3264 3300 +36 > compaction_alloc 3813 3838 +25 > tcp_leave_memory_pressure 86 110 +24 > account_guest_time 270 287 +17 > ... > > So I think Peter's version was the best. It shuts up clang and also > preserves the existing (good) gcc 'sub' behavior. I'll send it out for > real in a bit, but I'm thinking of something like the attached patch. I am fine as long as you keep the (added) test cases and maybe even extend them. I dunno how you will go with the fact that Andrew applied my version already. ... > This can be quickly reproduced by setting CONFIG_WERROR=y and running: > > make W=1 CC=clang-14 net/ipv4/tcp_output.o Hint: You can use LLVM=-14 instead of CC=clang-14. -- With Best Regards, Andy Shevchenko