Re: KVM: unknown exit, hardware reason 31

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I guess it happened on this scenario:


1. QEMU drops mmio region
2. invalidate all mmio sptes
3.

        VCPU 0                          KVM        VCPU 1
    access the invalid mmio spte
                                   page reclaim
                                   zap shadow page

                                                access the region originally was MMIO before
                                                set the spte to the normal ram map

    mmio #PF
    check the spte and see it becomes normal ram mapping !!!


The issue is caused by fast invalidate mmio sptes which increases
generation number instead of zapping mmio sptes (SRCU can ensure the vcpu
either see mmio spte or being zapped / zapped sptes.).

The simple fix is just drop the check_direct_spte_mmio_pf(), let VCPU access
again as follows:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4417146..299a5da 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3299,21 +3299,6 @@ static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
        return vcpu_match_mmio_gva(vcpu, addr);
 }

-
-/*
- * On direct hosts, the last spte is only allows two states
- * for mmio page fault:
- *   - It is the mmio spte
- *   - It is zapped or it is being zapped.
- *
- * This function completely checks the spte when the last spte
- * is not the mmio spte.
- */
-static bool check_direct_spte_mmio_pf(u64 spte)
-{
-       return __check_direct_spte_mmio_pf(spte);
-}
-
 static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
 {
        struct kvm_shadow_walk_iterator iterator;
@@ -3356,13 +3341,6 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
        }

        /*
-        * It's ok if the gva is remapped by other cpus on shadow guest,
-        * it's a BUG if the gfn is not a mmio page.
-        */
-       if (direct && !check_direct_spte_mmio_pf(spte))
-               return RET_MMIO_PF_BUG;
-
-       /*
         * If the page table is zapped by other cpus, let CPU fault again on
         * the address.
         */

Pavel, could you please check if it works for you?

I will fully consider the case and post the right fix out...

On 07/25/2015 03:25 AM, Pavel Shirshov wrote:
Hello,

I'm running a lot of identical VMs under KVM. Sometimes (one time per
2000-3000 runs) I got following:

1. VM is paused in libvirt. It can't be just resumed. I can just reset
it and resume.
2. In VM log file I see following: "KVM: unknown exit, hardware reason
31" with a CPU dump.
3. In dmesg I see following:
[84245.284948] EPT: Misconfiguration.
[84245.285056] EPT: GPA: 0xfeda848
[84245.285154] ept_misconfig_inspect_spte: spte 0x5eaef50107 level 4
[84245.285344] ept_misconfig_inspect_spte: spte 0x5f5fadc107 level 3
[84245.285532] ept_misconfig_inspect_spte: spte 0x5141d18107 level 2
[84245.285723] ept_misconfig_inspect_spte: spte 0x52e40dad77 level 1

OS. 3.16.0-44-generic #59~14.04.1-Ubuntu SMP
QEMU: QEMU emulator version 2.0.0 (Debian 2.0.0+dfsg-2ubuntu1.14),
Copyright (c) 2003-2008 Fabrice Bellard

Is it linux kvm bug or CPU bug? How can I fix that?

I can reproduce the bug in one-two days. Is it possible to enable
deeper debug for the issue?

Thanks

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux