On Tue, Sep 1, 2020 at 9:12 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > On Tue, Sep 1, 2020 at 8:13 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > operands are the same. Also, have you seen any measurable differences > > when benching this? I can stick it into kbench9000 to see if you > > haven't looked yet. > > On a Skylake server (Xeon Gold 5120), I'm unable to see any measurable > difference with this, at all, no matter how much I median or mean or > reduce noise by disabling interrupts. > > One thing that sticks out is that all the replacements of r8-r15 by > their %r8d-r15d counterparts still have the REX prefix, as is > necessary to access those registers. The only ones worth changing, > then, are the legacy registers, and on a whole, this amounts to only > 48 bytes of difference. The patch implements one of x86 target specific optimizations, performed by gcc. The optimization results in code size savings of one byte, where REX prefix is omitted with legacy registers, but otherwise should have no measurable runtime effect. Since gcc applies this optimization universally to all integer registers, I took the same approach and implemented the same change to legacy and REX registers. As measured above, 48 bytes saved is a good result for such a trivial optimization. Uros.