On Thu, Aug 11, 2022 at 3:22 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Here's a recreation of that patch I mentioned where the OP_COMPARE is > moved out of the loop. Just for fun, look at how much better the code > generation is for the common case when you don't have the call messing > up the clobbered registers etc. Oh, sadly, clang does much worse here. Gcc ends up being able to not have a stack frame at all for __d_lookup_rcu() once that DCACHE_OP_COMPARE case has been moved out. The gcc code really looks very nice. Clang, not so much, and it still has spills and reloads. The loop still ends up better with clang (since that test is no longer in the loop), but the code generated doesn't go from "ugly to really nice", it just goes from "ugly to still somewhat ugly". Linus