Re: [PATCH] KVM: x86: inject exceptions produced by x86_decode_insn

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 29, 2017 at 12:44:42PM +0100, Paolo Bonzini wrote:
> On 29/11/2017 12:44, Eduardo Habkost wrote:
> > On Mon, Nov 13, 2017 at 09:32:09AM +0100, Paolo Bonzini wrote:
> >> On 13/11/2017 08:15, Wanpeng Li wrote:
> >>> 2017-11-10 17:49 GMT+08:00 Paolo Bonzini <pbonzini@xxxxxxxxxx>:
> >>>> Sometimes, a processor might execute an instruction while another
> >>>> processor is updating the page tables for that instruction's code page,
> >>>> but before the TLB shootdown completes.  The interesting case happens
> >>>> if the page is in the TLB.
> >>>>
> >>>> In general, the processor will succeed in executing the instruction and
> >>>> nothing bad happens.  However, what if the instruction is an MMIO access?
> >>>> If *that* happens, KVM invokes the emulator, and the emulator gets the
> >>>> updated page tables.  If the update side had marked the code page as non
> >>>> present, the page table walk then will fail and so will x86_decode_insn.
> >>>>
> >>>> Unfortunately, even though kvm_fetch_guest_virt is correctly returning
> >>>> X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as
> >>>> a fatal error if the instruction cannot simply be reexecuted (as is the
> >>>> case for MMIO).  And this in fact happened sometimes when rebooting
> >>>> Windows 2012r2 guests.  Just checking ctxt->have_exception and injecting
> >>>> the exception if true is enough to fix the case.
> >>>
> >>> I found the only place which can set ctxt->have_exception is in the
> >>> function x86_emulate_insn(), and x86_decode_insn() will not set
> >>> ctxt->have_exception even if kvm_fetch_guest_virt() returns
> >>> X86_EMUL_PROPAGATE_FAULT.
> >>
> >> Hmm, you're right.  Looks like Yanan has been (un)lucky when trying out
> >> this patch! :(
> >>
> >> Yanan, can you double check that you can reproduce the issue with an
> >> unpatched kernel?  I will work on a kvm-unit-tests testcsae
> > 
> > We don't have a kvm-unit-tests reproducer for this yet, right?
> > 
> > I'm considering trying to write one, but I don't want to
> > duplicate work.
> 
> No, I haven't written one yet.

The reproducer (not a full test case) is quite simple, see patch below.

Now, I've noticed something interesting when running the
reproducer:

If the test_fetch_failure() call happens before we touch
pci-testdev through *mem (like in the patch below), we get an
emulation failure like the one Yanan saw:

  $ /usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel ./x86/emulator.flat # -initrd /tmp/tmp.RCPjppRp8i
  enabling apic
  paging enabled
  cr0 = 80010011
  cr3 = 45e000
  cr4 = 20
  KVM internal error. Suberror: 1
  emulation failure
  RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
  RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
  R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
  R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
  RIP=ffffffffffffc08a RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
  SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
  GS =0010 0000000000454d60 ffffffff 00c09300 DPL=0 DS   [-WA]
  LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
  TR =0080 000000000041148a 0000ffff 00008b00 DPL=0 TSS64-busy
  GDT=     000000000041100a 0000047f
  IDT=     0000000000000000 00000fff
  CR0=80010011 CR2=ffffffffffffc08a CR3=000000000045e000 CR4=00000020
  DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
  DR6=00000000ffff0ff0 DR7=0000000000000400
  EFER=0000000000000500
  Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

but if I call test_fetch_failure() after touching *mem, like this:

    diff --git a/x86/emulator.c b/x86/emulator.c
    index 977ec75..72cb035 100644
    --- a/x86/emulator.c
    +++ b/x86/emulator.c
    @@ -1124,7 +1124,6 @@ int main()
            alt_insn_page = alloc_page();
            insn_ram = vmap(virt_to_phys(insn_page), 4096);
    
    -       test_fetch_failure(mem, alt_insn_page);
    
            // test mov reg, r/m and mov r/m, reg
            t1 = 0x123456789abcdef;
    @@ -1135,6 +1134,8 @@ int main()
                         : "memory");
            report("mov reg, r/m (1)", t2 == 0x123456789abcdef);
    
    +       test_fetch_failure(mem, alt_insn_page);
    +
            test_simplealu(mem);
            test_cmps(mem);
            test_scas(mem);

then I get a KVM_INTERNAL_ERROR_DELIVERY_EV:

    $ /usr/bin/qemu-system-x86_64 -nodefaults -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc none -serial stdio -device pci-testdev -machine accel=kvm -kernel ./x86/emulator.flat # -initrd /tmp/tmp.lmXZa46TEA
    enabling apic
    paging enabled
    cr0 = 80010011
    cr3 = 45e000
    cr4 = 20
    PASS: mov reg, r/m (1)
    KVM internal error. Suberror: 3
    extra data[0]: 80000b0e
    extra data[1]: 31
    extra data[2]: 182
    extra data[3]: ff000ff8
    RAX=0000000000000000 RBX=0000000000000000 RCX=0000000000000000 RDX=0000000000000000
    RSI=0000000000000000 RDI=0000000000000000 RBP=0000000000000000 RSP=0000000000000000
    R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 R11=0000000000000000
    R12=0000000000000000 R13=0000000000000000 R14=0000000000000000 R15=0000000000000000
    RIP=ffffffffffffc08a RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
    ES =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
    CS =0008 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
    SS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
    DS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
    FS =0010 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
    GS =0010 0000000000454d60 ffffffff 00c09300 DPL=0 DS   [-WA]
    LDT=0000 0000000000000000 0000ffff 00008200 DPL=0 LDT
    TR =0080 000000000041148a 0000ffff 00008b00 DPL=0 TSS64-busy
    GDT=     000000000041100a 0000047f
    IDT=     0000000000000000 00000fff
    CR0=80010011 CR2=ffffffffffffc08a CR3=000000000045e000 CR4=00000020
    DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
    DR6=00000000ffff0ff0 DR7=0000000000000400
    EFER=0000000000000500
    Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
    ^C

Also, if I run the reproducer using ept=0, it gets stuck into a
loop re-entering the same "in (%dx),%al" instruction over and
over again.  trace-cmd report output:

    qemu-system-x86-18185 [001] 1057573.830491: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830494: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830503: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830504: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830505: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830506: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830507: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830508: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830509: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830510: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830511: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830511: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830512: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830513: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830514: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830514: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830515: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830516: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830517: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830518: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830519: kvm_entry:            vcpu 0
    qemu-system-x86-18185 [001] 1057573.830521: kvm_exit:             reason IO_INSTRUCTION rip 0xffffffffffffc08a info 8 0
    qemu-system-x86-18185 [001] 1057573.830522: kvm_emulate_insn:     0:ffffffffffffc08a: 4d 89 2c 24
    qemu-system-x86-18185 [001] 1057573.830523: kvm_entry:            vcpu 0
    [...]

Signed-off-by: Eduardo Habkost <ehabkost@xxxxxxxxxx>
---
 x86/emulator.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/x86/emulator.c b/x86/emulator.c
index e6f27cc..977ec75 100644
--- a/x86/emulator.c
+++ b/x86/emulator.c
@@ -792,9 +792,11 @@ static void trap_emulator(uint64_t *mem, void *alt_insn_page,
 	extern u8 insn_page[], test_insn[];
 
 	insn_ram = vmap(virt_to_phys(insn_page), 4096);
-	memcpy(alt_insn_page, insn_page, 4096);
-	memcpy(alt_insn_page + (test_insn - insn_page),
-			(void *)(alt_insn->ptr), alt_insn->len);
+	if (alt_insn_page) {
+		memcpy(alt_insn_page, insn_page, 4096);
+		memcpy(alt_insn_page + (test_insn - insn_page),
+				(void *)(alt_insn->ptr), alt_insn->len);
+	}
 	save = inregs;
 
 	/* Load the code TLB with insn_page, but point the page tables at
@@ -805,7 +807,11 @@ static void trap_emulator(uint64_t *mem, void *alt_insn_page,
 	invlpg(insn_ram);
 	/* Load code TLB */
 	asm volatile("call *%0" : : "r"(insn_ram));
-	install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
+	if (alt_insn_page) {
+		install_page(cr3, virt_to_phys(alt_insn_page), insn_ram);
+	} else {
+		install_pte(cr3, 1, insn_ram, PT_USER_MASK, 0);
+	}
 	/* Trap, let hypervisor emulate at alt_insn_page */
 	asm volatile("call *%0": : "r"(insn_ram+1));
 
@@ -1096,6 +1102,11 @@ static void test_illegal_movbe(void)
 	handle_exception(UD_VECTOR, 0);
 }
 
+static void test_fetch_failure(void *mem, void *alt_insn_page)
+{
+	trap_emulator(mem, NULL, NULL);
+}
+
 int main()
 {
 	void *mem;
@@ -1113,6 +1124,8 @@ int main()
 	alt_insn_page = alloc_page();
 	insn_ram = vmap(virt_to_phys(insn_page), 4096);
 
+	test_fetch_failure(mem, alt_insn_page);
+
 	// test mov reg, r/m and mov r/m, reg
 	t1 = 0x123456789abcdef;
 	asm volatile("mov %[t1], (%[mem]) \n\t"
-- 
2.13.6


-- 
Eduardo



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux