Re: [RFC PATCH] Emulate MOVBE

Andre Przywara <andre@xxxxxxxxx> · Wed, 10 Apr 2013 11:29:42 +0200

On Wed, 10 Apr 2013 01:46:02 +0200
Borislav Petkov <bp@xxxxxxxxx> wrote:

> Hi guys,
> 
> so I was trying to repro tglx's bug in smpboot.c and for some reason,
> the most reliable way to trigger it was to boot an 32-bit atom smp
> guest in kvm (don't ask :)).
> 
> The problem, however, was that atom wants MOVBE and qemu doesn't
> emulate it yet (except Richard's patches which I used in order to be
> able to actually even boot a guest).
> 
> However, without hw acceleration, qemu is pretty slow, and waiting for
> an smp guest to boot in sw-only emulation is not fun. So, just for
> funsies, I decided to give the MOVBE emulation a try.
> 
> Patch is below, 8 core smp atom guest boots fine and in 6-ish seconds
> with it. :-)
> 
> I know, I know, it still needs cleaning up and proper rFLAGS handling
> but that is for later. For now, I'd very much like to hear whether
> this approach even makes sense and if so, what should be improved.
> 
> Thanks, and thanks to Andre and Jörg for their help.
> 
> Not-yet-signed-off-by: Borislav Petkov <bp@xxxxxxx>

You are missing an (hackish) older patch which enables the MOVBE CPUID
bit even if the host does not have it (malformed due to c&p):

--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -273,7 +273,7 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2
*entry, u32 function,
		cpuid_mask(&entry->ecx, 4); /* we support x2apic
emulation even if host does not support
		 * it since we emulate x2apic in software */
-		entry->ecx |= F(X2APIC);
+		entry->ecx |= (F(X2APIC) | F(MOVBE));
		break;
	/* function 2 entries are STATEFUL. That is, repeated cpuid
commands
	 * may return different values. This forces us to get_cpu()
before

> 
> --
> diff --git a/arch/x86/include/asm/kvm_emulate.h
> b/arch/x86/include/asm/kvm_emulate.h index 15f960c06ff7..ae01c765cd77
> 100644 --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -281,6 +281,7 @@ struct x86_emulate_ctxt {
>  
>  	/* decode cache */
>  	u8 twobyte;
> +	u8 thirdbyte;
>  	u8 b;
>  	u8 intercept;
>  	u8 lock_prefix;
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index a335cc6cde72..0ccff339359d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -152,6 +152,7 @@
>  #define Avx         ((u64)1 << 43)  /* Advanced Vector Extensions */
>  #define Fastop      ((u64)1 << 44)  /* Use opcode::u.fastop */
>  #define NoWrite     ((u64)1 << 45)  /* No writeback */
> +#define ThreeByte   ((u64)1 << 46)  /* Three byte opcodes 0F 38 and
> 0F 3A */ 
>  #define X2(x...) x, x
>  #define X3(x...) X2(x), x
> @@ -3107,6 +3108,34 @@ static int em_mov(struct x86_emulate_ctxt
> *ctxt) return X86EMUL_CONTINUE;
>  }
>  
> +static int em_movbe(struct x86_emulate_ctxt *ctxt)
> +{
> +	char *valptr = ctxt->src.valptr;
> +
> +	switch (ctxt->op_bytes) {
> +	case 2:
> +		*(u16 *)valptr = swab16(*(u16 *)valptr);
> +		break;
> +	case 4:
> +		*(u32 *)valptr = swab32(*(u32 *)valptr);
> +
> +		/*
> +		 * clear upper dword for 32-bit operand size in
> 64-bit mode.
> +		 */
> +		if (ctxt->mode == X86EMUL_MODE_PROT64)
> +			*((u32 *)valptr + 1) = 0x0;
> +		break;
> +	case 8:
> +		*(u64 *)valptr = swab64(*(u64 *)valptr);
> +		break;
> +	default:
> +		return X86EMUL_PROPAGATE_FAULT;
> +	}
> +
> +	memcpy(ctxt->dst.valptr, ctxt->src.valptr, ctxt->op_bytes);
> +	return X86EMUL_CONTINUE;
> +}
> +
On 04/09/2013 05:03 PM, Borislav Petkov wrote:
> 
> Note to self: this destroys the src operand but it shouldn't. Fix it
> tomorrow.
> 

What about this one?

	*(u16 *)(ctxt->dst.valptr) = swab16(*(u16 *)(ctxt->src.valptr));

Respective versions for the other operand sizes, of course, and drop
the final memcpy.

>  static int em_cr_write(struct x86_emulate_ctxt *ctxt)
>  {
>  	if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
> @@ -3974,7 +4003,8 @@ static const struct opcode twobyte_table[256] =
> { I(ImplicitOps | VendorSpecific, em_sysenter),
>  	I(ImplicitOps | Priv | VendorSpecific, em_sysexit),
>  	N, N,
> -	N, N, N, N, N, N, N, N,
> +	I(ModRM | Mov | ThreeByte | VendorSpecific, em_movbe),
> +	N, N, N, N, N, N, N,

In a real world VendorSpecific should be replaced with something more
meaningful. Depends on KVMs intention to emulate instructions, actually
out of scope for a pure virtualizer.

What is the opinion from the KVM folks on this? Shall we start to
emulate instructions the host does not provide? In this particular case
a relatively simple patch fixes a problem (starting Atom optimized
kernels on non-Atom machines).

(And if one can believe the AMD Fam16h SWOG [1], PS4^Wfuture AMD
processors have MOVBE, so it's not even actually one CPU anymore).

>  	/* 0x40 - 0x4F */
>  	X16(D(DstReg | SrcMem | ModRM | Mov)),
>  	/* 0x50 - 0x5F */
> @@ -4323,6 +4353,15 @@ done_prefixes:
>  	}
>  	ctxt->d = opcode.flags;
>  
> +	if (ctxt->d & ThreeByte) {
> +		ctxt->thirdbyte = insn_fetch(u8, ctxt);
> +
> +		if (ctxt->thirdbyte == 0xf0)
> +			ctxt->d |= DstReg | SrcMem;
> +		else
> +			ctxt->d |= DstMem | SrcReg;
> +	}
> +

As already agreed on on IRC, we should properly check for the other
opcodes in the 0F 38/3A table. Or make it right and implement a real
sub-table.

Regards,
Andre.


[1]http://support.amd.com/us/Processor_TechDocs/52128_16h_Software_Opt_Guide.zip
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html