On Mon, Apr 15, 2024 at 10:12 PM Borislav Petkov <bp@xxxxxxxxx> wrote: > > locking/atomic/x86: Introduce arch_try_cmpxchg64_local() > > > > Introduce arch_try_cmpxchg64_local() for 64-bit and 32-bit targets > > to improve code using cmpxchg64_local(). On 64-bit targets, the > > generated assembly improves from: > > > > 3e28: 31 c0 xor %eax,%eax > > 3e2a: 4d 0f b1 7d 00 cmpxchg %r15,0x0(%r13) > > 3e2f: 48 85 c0 test %rax,%rax > > 3e32: 0f 85 9f 00 00 00 jne 3ed7 <...> > > > > to: > > > > 3e28: 31 c0 xor %eax,%eax > > 3e2a: 4d 0f b1 7d 00 cmpxchg %r15,0x0(%r13) > > 3e2f: 0f 85 9f 00 00 00 jne 3ed4 <...> > > > > where a TEST instruction after CMPXCHG is saved. The improvements > > for 32-bit targets are even more noticeable, because double-word > > compare after CMPXCHG8B gets eliminated. > > Ok, maybe I'm missing the point here or maybe the commit message doesn't > explain but how does this big diffstat justify one less insn? We are dealing with locking primitives, probably the hottest part of the kernel. For 64-bits, the patch is effectively a couple of lines, reusing and extending existing macros, the line count for a modern 32-bit target is also a couple of lines, but there the saved insn count is much higher, around 10 instructions. What bothers you is the line count of cmpxchg8b emulation, necessary to handle !CONFIG_X86_CMPXCHG64 targets. They are still alive, and cmpxchg8b emulation code was recently improved (not by me!) to correctly handle the ZF flag. The added code just builds on that. > And no, 32-bit doesn't matter. Really? Was this decision reached by the community consensus? The linux kernel has many uses, and using it for servers by a big company, you are the voice of, is only one part of the whole picture. I'm sure that 32-bit is quite relevant for embedded users and more than relevant to a student or an enthusiast in some remote part of the world. As a maintainer, you should also take care of the communities that are somehow neglected, where unilateral decisions like the one above can have unwanted consequences. > This looks like too crazy micro-optimization to me to be worth the 40 > insertions. If the line count is the problem, I can easily parametrize new and existing big macro descriptions in a follow-up patch. However, I was advised to not mix everything together in one patch, but rest assured, the creation and testing of the follow-up patch would take me less time than writing the message you are reading. > But I could be wrong and I'd gladly read explanations for why I am. I would understand if someone asked you to write some new functionality for a seemingly obsolete target. Everybody would understand your rejection. But this improvement was written by me in my time, and demonstrated some benefit to the existing code. The patch is almost trivial and it took maybe 5 minutes of the maintainer's time to approve it. It brings no future maintenance burden, but it perhaps improves someone's life a tiny bit. Last, but not least, I'm bringing some ideas from the compiler development community, where the attitude to redundant instructions is totally different. It could take weeks of effort and considerable rewrite of compiler functionality just to remove one instruction ;) Micro-optimizations add up! Thanks for reading this, Uros.