nGPA vs. GPA for vmcs_read(GUEST_PHYSICAL_ADDRESS) and PML

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Fri, 11 Aug 2017 18:03:45 +0200

I may be completely off track, but I am wondering whether uses of the 
GUEST_PHYSICAL_ADDRESS field from vmcs are wrong when operating under 
nested virtualization.

My worry is that during L2->L0 vmexits the field represents nested GPAs, 
but KVM is expecting it to be a GPA.  In particular:

- if PML is active, the value that is written to the L0 page 
modification log is a nested GPA instead of a GPA.  This would break 
nested dirty tracking when PML is active.

- handle_ept_misconfig would be completely broken if a guest is showing 
an I/O range to the nested guest.

I'm sure the Google guys have figured this out already and are just 
testing how dumb this maintainer is. ;)  In any case, a lightly-
tested patch follows my sig for the second issue.

Paolo



---------------- 8< -------------
>From 2ffb87523d8447b7505127e497bb20d1ec61a5f3 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Date: Fri, 11 Aug 2017 17:45:55 +0200
Subject: [PATCH] KVM: x86: fix use of L1 MMIO areas in nested guests

There is currently some confusion between nested and L1 GPAs.  The
assignment to "direct" in kvm_mmu_page_fault tries to fix that, but
it is not enough.  What this patch does is fence off the MMIO cache
completely when using shadow nested page tables, since we have neither
a GVA nor an L1 GPA to put in the cache.  This also allows some
simplifications in kvm_mmu_page_fault and FNAME(page_fault).

The EPT misconfig has two changes done.  First, it does not have an
L1 GPA to pass to kvm_io_bus_write, so it must be skipped for guest
mode.  Second, calling handle_mmio_page_fault() has been unnecessary
since commit e9ee956e311d ("KVM: x86: MMU: Move handle_mmio_page_fault()
call to kvm_mmu_page_fault()", 2016-02-22)

Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
 arch/x86/kvm/mmu.c         | 10 +++++++++-
 arch/x86/kvm/paging_tmpl.h |  3 +--
 arch/x86/kvm/vmx.c         | 27 ++++++++++++---------------
 arch/x86/kvm/x86.h         |  6 +++++-
 4 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e3c0f09eaa8..bedeeb633f80 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3598,6 +3598,14 @@ static bool is_shadow_zero_bits_set(struct kvm_mmu *mmu, u64 spte, int level)
 
 static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct)
 {
+	/*
+	 * A nested guest cannot use the MMIO cache if it is using nested
+	 * page tables, because cr2 is a nGPA while the cache stores L1's
+	 * physical addresses.
+	 */
+	if (mmu_is_nested(vcpu))
+		return false;
+
 	if (direct)
 		return vcpu_match_mmio_gpa(vcpu, addr);
 
@@ -4835,7 +4843,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
 {
 	int r, emulation_type = EMULTYPE_RETRY;
 	enum emulation_result er;
-	bool direct = vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu);
+	bool direct = vcpu->arch.mmu.direct_map;
 
 	if (unlikely(error_code & PFERR_RSVD_MASK)) {
 		r = handle_mmio_page_fault(vcpu, cr2, direct);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3bb90ceeb52d..86b68dc5a649 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -790,8 +790,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 			 &map_writable))
 		return 0;
 
-	if (handle_abnormal_pfn(vcpu, mmu_is_nested(vcpu) ? 0 : addr,
-				walker.gfn, pfn, walker.pte_access, &r))
+	if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r))
 		return r;
 
 	/*
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 283e94de3853..d9e9bf96edb3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6405,24 +6405,21 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
 	int ret;
 	gpa_t gpa;
 
+	/*
+	 * A nested guest cannot optimize MMIO vmexits, because we have an
+	 * nGPA here instead of the required GPA.
+	 */
 	gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
-	if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
-		trace_kvm_fast_mmio(gpa);
-		return kvm_skip_emulated_instruction(vcpu);
+	if (!is_guest_mode(vcpu)) {
+		if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
+			trace_kvm_fast_mmio(gpa);
+			return kvm_skip_emulated_instruction(vcpu);
+		}
 	}
 
-	ret = handle_mmio_page_fault(vcpu, gpa, true);
-	vcpu->arch.gpa_available = !is_guest_mode(vcpu);
-	vcpu->arch.gpa_val = gpa;
-	if (likely(ret == RET_MMIO_PF_EMULATE))
-		return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
-					      EMULATE_DONE;
-
-	if (unlikely(ret == RET_MMIO_PF_INVALID))
-		return kvm_mmu_page_fault(vcpu, gpa, 0, NULL, 0);
-
-	if (unlikely(ret == RET_MMIO_PF_RETRY))
-		return 1;
+	ret = kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
+	if (ret >= 0)
+		return ret;
 
 	/* It is the real ept misconfig */
 	WARN_ON(1);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 612067074905..2383d2ce0a84 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -90,7 +90,11 @@ static inline u32 bit(int bitno)
 static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
 					gva_t gva, gfn_t gfn, unsigned access)
 {
-	vcpu->arch.mmio_gva = gva & PAGE_MASK;
+	/*
+	 * If this is a shadow nested page table, the "GVA" is
+	 * actually a nested GPA.
+	 */
+	vcpu->arch.mmio_gva = mmu_is_nested(vcpu) ? 0 : gva & PAGE_MASK;
 	vcpu->arch.access = access;
 	vcpu->arch.mmio_gfn = gfn;
 	vcpu->arch.mmio_gen = kvm_memslots(vcpu->kvm)->generation;
-- 
1.8.3.1