On 17/08/20 18:42, Sean Christopherson wrote: > On Fri, Aug 14, 2020 at 09:21:05PM +0800, Yang Weijiang wrote: >> If debug_regs.c is built with newer gcc, e.g., 8.3.1 on my side, then the generated >> binary looks like over-optimized by gcc: >> >> asm volatile("ss_start: " >> "xor %%rax,%%rax\n\t" >> "cpuid\n\t" >> "movl $0x1a0,%%ecx\n\t" >> "rdmsr\n\t" >> : : : "rax", "ecx"); >> >> is translated to : >> >> 000000000040194e <ss_start>: >> 40194e: 31 c0 xor %eax,%eax <----- rax->eax? >> 401950: 0f a2 cpuid >> 401952: b9 a0 01 00 00 mov $0x1a0,%ecx >> 401957: 0f 32 rdmsr >> >> As you can see rax is replaced with eax in taret binary code. > > It's an optimization. `xor rax, rax` and `xor eax, eax` yield the exact > same result, as writing the lower 32 bits of a GPR in 64-bit mode clears > the upper 32 bits. Using the eax variant avoids the REX prefix and saves > a byte of code. I would have expected that from binutils though, not GCC. > Use `xor %%eax, %%eax`. That should always generate a 2 byte instruction. > Encoding a 64-bit operation would technically be legal, but I doubt any > compiler would do that in practice. Indeed, and in addition the clobbers are incorrect since they miss rbx and rdx. I've sent a patch. Paolo