On Mon, Aug 17, 2020 at 07:19:17PM +0200, Paolo Bonzini wrote: > On 17/08/20 18:42, Sean Christopherson wrote: > > On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote: > >> If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated > >> binary looks like over-optimized by gcc: > >> > >> asm volatile("ss_start: " > >> "xor %%rax,%%rax\n\t" > >> "cpuid\n\t" > >> "movl $0x1a0,%%ecx\n\t" > >> "rdmsr\n\t" > >> : : : "rax", "ecx"); > >> > >> is translated to : > >> > >> 000000000040194e <ss_start>: > >> 40194e: 31 c0 xor %eax,%eax <----- rax->eax? > >> 401950: 0f a2 cpuid > >> 401952: b9 a0 01 00 00 mov $0x1a0,%ecx > >> 401957: 0f 32 rdmsr > >> > >> As you can see rax is replaced with eax in taret binary code. > > > > It's an optimization. `xor rax, rax` and `xor eax, eax` yield the exact > > same result, as writing the lower 32 bits of a GPR in 64-bit mode clears > > the upper 32 bits. Using the eax variant avoids the REX prefix and saves > > a byte of code. > > I would have expected that from binutils though, not GCC. > > > Use `xor %%eax, %%eax`. That should always generate a 2 byte instruction. > > Encoding a 64-bit operation would technically be legal, but I doubt any > > compiler would do that in practice. > > Indeed, and in addition the clobbers are incorrect since they miss rbx > and rdx. I've sent a patch. > Thanks Paolo and Sean for the feedback! > Paolo