On Wed, Sep 02, 2020 at 07:50:36AM +0200, Uros Bizjak wrote: > On Tue, Sep 1, 2020 at 9:12 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > > > On Tue, Sep 1, 2020 at 8:13 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > > operands are the same. Also, have you seen any measurable differences > > > when benching this? I can stick it into kbench9000 to see if you > > > haven't looked yet. > > > > On a Skylake server (Xeon Gold 5120), I'm unable to see any measurable > > difference with this, at all, no matter how much I median or mean or > > reduce noise by disabling interrupts. > > > > One thing that sticks out is that all the replacements of r8-r15 by > > their %r8d-r15d counterparts still have the REX prefix, as is > > necessary to access those registers. The only ones worth changing, > > then, are the legacy registers, and on a whole, this amounts to only > > 48 bytes of difference. > > The patch implements one of x86 target specific optimizations, > performed by gcc. The optimization results in code size savings of one > byte, where REX prefix is omitted with legacy registers, but otherwise > should have no measurable runtime effect. Since gcc applies this > optimization universally to all integer registers, I took the same > approach and implemented the same change to legacy and REX registers. > As measured above, 48 bytes saved is a good result for such a trivial > optimization. Could we instead implement this optimization in GAS ? Then we can leave the code as-is.