On Fri, Sep 06, 2024, James Houghton wrote: > On Fri, Sep 6, 2024 at 5:53 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > #ifdef __x86_64__ > > - asm volatile(".byte 0xc6,0x40,0x0,0x0" :: "a" (gpa) : "memory"); /* MOV RAX, [RAX] */ > > + asm volatile(".byte 0x48,0x89,0x00" :: "a"(gpa) : "memory"); /* mov %rax, (%rax) */ > > FWIW I much prefer the trailing comment you have ended up with vs. the > one you had before. (To me, the older one _seems_ like it's Intel > syntax, in which case the comment says it's a load..? The comment you > have now is, to me, obviously indicating a store. Though... perhaps > "movq"?) TL;DR: "movq" is arguably a worse mnemonic than simply "mov" because MOV *and* MOVQ are absurdly overloaded mnemonics, and because x86-64 is wonky. Heh, "movq" is technically a different instruction (MMX/SSE instruction). For ambiguous mnemonics, the assembler infers the exact instructions from the operands. When a register is the source or destination, appending the size to a vanilla MOV is 100% optional, as the width of the register communicates the desired size without any ambiguity. When there is no register operand, e.g. storing an immediate to memory, the size becomes necessary, sort of. The assembler will still happily accept an inferred size, but the size is simply the default operand size for the current mode. E.g. mov $0xffff, (%0) will generate a 4-byte MOV c7 00 ff ff 00 00 so if you actually wanted a 2-byte MOV, the mnemonic needs to be: movw $0xffff, (%0) There is still value in specifying an explicit operand size in assembly, as it disambiguates the size of human readers, and also generates an error if the operands mismatch. E.g. movw $0xffff, %%eax will fail with incorrect register `%eax' used with `w' suffix The really fun one is if ou want to load a 64-bit gpr with an immediate. All else being equal, the assembler will generally optimize for code size, and so if the desired value can be generated by sign-extension, the compiler will opt for opcode 0xc7 or 0xb8 E.g. mov $0xffffffffffffffff, %%rax generates 48 c7 c0 ff ff ff ff whereas, somewhat counter-intuitively, this mov $0xffffffff, %%rax generates the more gnarly 48 b8 ff ff ff ff 00 00 00 00 But wait, there's more! If the developer were a wee bit smarter, they could/should actually write mov $0xffffffff, %%eax to generate b8 ff ff ff ff because in x86-64, writing the lower 32 bits of a 64-bit register architecturally clears the upper 32 bits. I mention this because you'll actually see the compiler take advantage of this behavior. E.g. if you were to load RAX through an inline asm constraint asm volatile(".byte 0xcc" :: "a"(0xffffffff) : "memory"); the generated code will indeed be: b8 ff ff ff ff mov $0xffffffff,%eax or if you explicitly load a register with '0' 31 c0 xor %eax,%eax Lastly, because "%0" in 64-bit mode refers to RAX, not EAX, this: asm volatile("mov $0xffffffff, %0" :: "a"(gpa) : "memory"); generates 48 b8 ff ff ff ff 00 00 00 00 i.e. is equivalent to "mov .., %%rax". Jumping back to "movq", it's perfectly fine in this case, but also fully redundant. And so I would prefer to document it simply as "mov", because "movq" would be more appropriate to document something like this: asm volatile("movq %0, %%xmm0" :: "a"(gpa) : "memory"); 66 48 0f 6e c0 movq %rax,%xmm0 LOL, which brings up more quirks/warts with x86-64. Many instructions in x86, especially SIMD instructions, have mandatory "prefixes" in order to squeeze more instructions out of the available opcodes. E.g. the operand size prefix, 0x66, is reserved for MMX instructions, which allows the architecture to usurp the reserved combination for XMM instructions. Table 9-3. Effect of Prefixes on MMX Instructions says this Operand Size (66H)Reserved and may result in unpredictable behavior. and specifically says "unpredictable behavior" instead of #UD, because prefixing most MMX instructions with 0x66 "promotes" the instruction to operate on XMM registers. And then there's the REX prefix, which is actually four prefixes built into one. The "base" prefix ix 0x40, with the lower 4 bits encoding the four "real" prefixes.