CC Kevin as i am not sure if Intel is aware of this issue, it
breaks other hypervisors, e.g, Xen, as swell.
On 05/11/2017 07:23 PM, Paolo Bonzini wrote:
The new ept_access_test_paddr_read_only_ad_disabled testcase
caused an infinite stream of EPT violations because KVM did not
find anything bad in the page tables and kept re-executing the
faulting instruction.
This is because the exit qualification said we were reading from
the page tables, but actually writing the cause of the EPT violation
was writing the A/D bits. This happened even with eptad=0, quite
surprisingly.
Thus, always treat guest page table accesses as read+write operations,
even if the exit qualification says otherwise. This fixes the
testcase.
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
arch/x86/kvm/vmx.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6f4ad44aa95..c868cbdad29a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6209,17 +6209,19 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
u32 error_code;
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+ gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+ trace_kvm_page_fault(gpa, exit_qualification);
- if (is_guest_mode(vcpu)
- && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
- /*
- * Fix up exit_qualification according to whether guest
- * page table accesses are reads or writes.
- */
- u64 eptp = nested_ept_get_cr3(vcpu);
- if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
- exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
- }
+ /*
+ * All guest page table accesses are potential writes to A/D bits.
+ * but EPT microcode only reports them as such when EPT A/D is
+ * enabled. Tracing ept_access_test_paddr_read_only_ad_disabled (from
+ * kvm-unit-tests) with eptad=0 and eptad=1 shows that the processor
+ * does not change its behavior when EPTP enables A/D bits; the only
+ * difference is in the exit qualification. So fix this up here.
+ */
+ if (!(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED))
+ exit_qualification |= EPT_VIOLATION_ACC_WRITE;
/*
* EPT violation happened while executing iret from NMI,
@@ -6231,9 +6233,6 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
(exit_qualification & INTR_INFO_UNBLOCK_NMI))
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, GUEST_INTR_STATE_NMI);
- gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
- trace_kvm_page_fault(gpa, exit_qualification);
-
/* Is it a read fault? */
error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
? PFERR_USER_MASK : 0;
@@ -6250,6 +6249,17 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
? PFERR_PRESENT_MASK : 0;
vcpu->arch.gpa_available = true;
+
+ if (is_guest_mode(vcpu)
+ && !(exit_qualification & EPT_VIOLATION_GVA_TRANSLATED)) {
+ /*
+ * Now fix up exit_qualification according to what the
+ * L1 hypervisor expects to see.
+ */
+ u64 eptp = nested_ept_get_cr3(vcpu);
+ if (!(eptp & VMX_EPT_AD_ENABLE_BIT))
+ exit_qualification &= ~EPT_VIOLATION_ACC_WRITE;
+ }
I am not sure if this is really needed, it (PFEC.W = 0 if A/D need to be set on
page structures) is not we expect.
Maybe always report the right behavior is better? Especially,Intel may fix its
microcode as it hurts the newest CPUs as well.
Thanks!