Re: [PATCH -v2] kvm: Emulate MOVBE

Borislav Petkov <bp@xxxxxxxxx> · Sun, 14 Apr 2013 23:02:18 +0200

On Sun, Apr 14, 2013 at 11:43:03AM +0300, Gleb Natapov wrote:
> > +#define EmulateOnUD ((u64)1 << 46)  /* emulate if unsupported by the host */
> >  
> Just rename VendorSpecific to EmulateOnUD. The meaning is the same.

done.

> > +static int em_movbe(struct x86_emulate_ctxt *ctxt)
> > +{
> > +	switch (ctxt->op_bytes) {
> > +	case 2:
> > +		*(u16 *)ctxt->dst.valptr = swab16(*(u16 *)ctxt->src.valptr);
> What's wrong with: ctxt->dst.val = swab16((u16)ctxt->src.val)?

This doesn't work for the 16-bit swab because of the following
functionality of MOVBE:

"When the operand size is 16 bits, the upper word of the destination
register remains unchanged."

Now here's what gcc produces here:

	movzwl	112(%rdi), %eax	# ctxt_5(D)->src.D.27823.val, tmp83
	rolw	$8, %ax	#, tmp83
	movzwl	%ax, %eax	# tmp83, tmp84
	movq	%rax, 240(%rdi)	# tmp84,

See the zero-extension happening twice (second one should not be needed,
even) and it is actually writing the whole 64-bit value in %rax back
which effectively destroys the upper word?

However, this, a bit uglier version works:

	*(u16 *)&ctxt->dst.val = swab16((u16)ctxt->src.val);

	movzwl  112(%rdi), %eax # ctxt_5(D)->src.D.27823.val, tmp82
	rolw    $8, %ax #, tmp82
	movw    %ax, 240(%rdi)  # tmp82, MEM[(u16 *)ctxt_5(D) + 240B]

Whichever version we take, I'll slap a comment explaining why we're
doing it differently for 16-bit.

> > +	case 4:
> > +		*(u32 *)ctxt->dst.valptr = swab32(*(u32 *)ctxt->src.valptr);
> > +
> > +		/*
> > +		 * clear upper dword for 32-bit operand size in 64-bit mode.
> > +		 */
> > +		if (ctxt->mode == X86EMUL_MODE_PROT64)
> > +			*((u32 *)ctxt->dst.valptr + 1) = 0x0;
> It should be clean without this this.

Right, this should probably work because for 32-bit, we don't care
about the zero-extended upper dword and in 64-bit mode we want to do it
anyway.

> > +		break;
> > +	case 8:
> > +		*(u64 *)ctxt->dst.valptr = swab64(*(u64 *)ctxt->src.valptr);

Ditto for this one.

> > +static const struct opcode threebyte_table[] = {
> > +	[0xf0] = I(DstReg | SrcMem | ModRM | Mov | EmulateOnUD, em_movbe),
> > +	[0xf1] = I(DstMem | SrcReg | ModRM | Mov | EmulateOnUD, em_movbe),
> > +};
> > +
> Populate the whole table explicitly please. You can use X16(N) to make it
> short. The encoding is also wrong since 0F 38 F0/F1 instruction decoding
> depend on a prefix.

... on the absence of a prefix, actually.

> Same opcode is decoded as CRC32 with f2 prefix. We have Prefix
> mechanism for such instructions, but it currently assumes that 66 and
> f2/f3 prefixes are mutually exclusive, but in this case they are not,
> ugh.

Yeah, I'm lazy and wanted to leave the honors to the poor soul who
implements the whole 0F_38h opcode map. :-)

Ok, I probably could do something similar to pfx_vmovntpx with the GP
macro for the F0/F1 opcodes. I'll give it a try when I get a chance
soonish.

> > +
> > +		if (ctxt->b == 0x38)
> > +			opcode = threebyte_table[insn_fetch(u8, ctxt)];
> ctxt->b = insn_fetch(u8, ctxt);
> opcode = threebyte_table[ctxt->b];
> 
> Otherwise OpImplicit type of decoding will not work for three byte
> instructions. Also should rename twobyte into opbytes and set it to 1,
> 2 or 3. I prefer that three byte instruction decoding will be sent as a
> separate patch.

Ok.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html