On Wed, Apr 17, 2024 at 8:41 PM Borislav Petkov <bp@xxxxxxxxx> wrote: > > If the line count is the problem, I can easily parametrize new and > > existing big macro descriptions in a follow-up patch. However, I was > > advised to not mix everything together in one patch, but rest assured, > > the creation and testing of the follow-up patch would take me less > > time than writing the message you are reading. > > I'm simply making sure we're not going off the rails with > micro-optimizing for no apparent reason. > > Saving a > > test %rax,%rax > > doesn't need fixing in my book. Because I don't think you'll be able to > even measure it. The above is perhaps a little unfortunate example taken from if (cmpxchg64(...)) where the check is against zero. The compiler can optimize the check to a TEST insn in this particular case, but otherwise CMP will be emitted for different usages. Not a big difference, but a register has to be kept live across cmpxchg8b. > > It brings no future maintenance burden, but it perhaps improves > > someone's life a tiny bit. > > This is where you and I disagree: touching that alternative in > __arch_try_cmpxchg64_emu_local() does as we tend to change them from > time to time, especially in recent times. > > And I wouldn't mind touching it but if it is there to save 10 insns on > 32-bit - which doesn't matter - then why bother? > > Or do you have a relevant 32-bit workload which brings any improvement > by this change? There is one important issue. When a register (or two for double-word values) has to be kept live for a compare, the register pressure on 32bit targets around cmpxchg8b goes through the roof, and when using the frame pointer (and maybe some fixed register, e.g. PIC), the register allocator runs out of available registers. The number of spills around cmpxchg8b signals the troubles register allocator goes through to "fix" everything, so from the compiler PoV any relief is more than welcome here. Even in GCC internal libraries, we have had to take a special approach with this insn to avoid internal compiler errors. The kernel was quite lucky here ;) Thanks and best regards, Uros.