On 10/01/2019 10:09 AM, Jim Mattson wrote:
On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan
<krish.sadhukhan@xxxxxxxxxx> wrote:
According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are
performed in a certain order. Checks on MSRs that are loaded on VM-entry
from VM-entry MSR-load area, should be done after verifying VMCS controls,
host-state area and guest-state area. As KVM relies on CPU hardware to
perform some of these checks, we need to defer VM-exit due to invalid
VM-entry MSR-load area to until after CPU hardware completes the earlier
checks and is ready to do VMLAUNCH/VMRESUME.
In order to defer errors arising from invalid VM-entry MSR-load area in
vmcs12, we set up a single invalid entry, which is illegal according to
section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of
vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure
due to MSR loading" after it completes checks on VMCS controls, host-state
area and guest-state area. We reflect a synthesized Exit Qualification to
our guest.
This change addresses the priority inversion, but it still potentially
leaves guest MSRs incorrectly loaded with values from the VMCS12
VM-entry MSR-load area when a higher priority error condition would
have precluded any processing of the VM-entry MSR-load area.
May be, we should not load any guest MSR until we have checked the
entire vmcs12 MSR-load area in nested_vmx_load_msr() ?
Suggested-by: Jim Mattson <jmattson@xxxxxxxxxx>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx>
Reviewed-by: Mihai Carabas <mihai.carabas@xxxxxxxxxx>
Reviewed-by: Liran Alon <liran.alon@xxxxxxxxxx>
---
arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++---
arch/x86/kvm/vmx/nested.h | 14 ++++++++++++--
arch/x86/kvm/vmx/vmcs.h | 6 ++++++
3 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ced9fba32598..b74491c04090 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
goto vmentry_fail_vmexit_guest_mode;
if (from_vmentry) {
- exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
exit_qual = nested_vmx_load_msr(vcpu,
vmcs12->vm_entry_msr_load_addr,
vmcs12->vm_entry_msr_load_count);
- if (exit_qual)
- goto vmentry_fail_vmexit_guest_mode;
+ if (exit_qual) {
+ /*
+ * According to section “VM Entries” in Intel SDM
+ * vol 3C, VM-entry checks are performed in a certain
+ * order. Checks on MSRs that are loaded on VM-entry
+ * from VM-entry MSR-load area, should be done after
+ * verifying VMCS controls, host-state area and
+ * guest-state area. As KVM relies on CPU hardware to
+ * perform some of these checks, we need to defer
+ * VM-exit due to invalid VM-entry MSR-load area to
+ * until after CPU hardware completes the earlier
+ * checks and is ready to do VMLAUNCH/VMRESUME.
+ *
+ * In order to defer errors arising from invalid
+ * VM-entry MSR-load area in vmcs12, we set up a
+ * single invalid entry, which is illegal according
+ * to section "Loading MSRs in Intel SDM vol 3C, in
+ * VM-entry MSR-load area of vmcs02. This will cause
+ * the CPU hardware to VM-exit with "VM-entry
+ * failure due to MSR loading" after it completes
+ * checks on VMCS controls, host-state area and
+ * guest-state area.
+ */
+ vmx->loaded_vmcs->invalid_msr_load_area.index =
+ MSR_FS_BASE;
Can this field be statically populated during initialization?
Yes.
+ vmx->loaded_vmcs->invalid_msr_load_area.value =
+ exit_qual;
This seems awkward. Why not save 16 bytes per loaded_vmcs by
allocating one invalid_msr_load_area system-wide and just add a 4 byte
field to struct nested_vmx to store this value?
OK.
+ vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1);
+ vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR,
+ __pa(&(vmx->loaded_vmcs->invalid_msr_load_area)));
+ }
Do you need to set vmx->nested.dirty_vmcs12 to ensure that
prepare_vmcs02_constant_state() will be called at the next emulated
VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR?
Yes.
} else {
/*
* The MMU is not initialized to point at the right entities yet and
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 187d39bf0bf1..f3a384235b68 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu)
static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
u32 exit_reason)
{
+ u32 exit_qual;
u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
/*
* At this point, the exit interruption info in exit_intr_info
@@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu,
vmcs_read32(VM_EXIT_INTR_ERROR_CODE);
}
- nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info,
- vmcs_readl(EXIT_QUALIFICATION));
+ exit_qual = vmcs_readl(EXIT_QUALIFICATION);
+
+ if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE &&
+ (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY |
+ EXIT_REASON_MSR_LOAD_FAIL))) {
Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if
the second conjunct is true and the first is not?
If the first conjunct isn't true and the second one is, it means it's a
hardware-detected MSR-load error. Right ? So won't that be handled the
same way it is handled currently in KVM ?
+ exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value;
+ }
+
+ nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual);
+
return 1;
}
diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h
index 481ad879197b..e272788bd4b8 100644
--- a/arch/x86/kvm/vmx/vmcs.h
+++ b/arch/x86/kvm/vmx/vmcs.h
@@ -70,6 +70,12 @@ struct loaded_vmcs {
struct list_head loaded_vmcss_on_cpu_link;
struct vmcs_host_state host_state;
struct vmcs_controls_shadow controls_shadow;
+ /*
+ * This field is used to set up an invalid VM-entry MSR-load area
+ * for vmcs02 if an error is detected while processing the entries
+ * in VM-entry MSR-load area of vmcs12.
+ */
+ struct vmx_msr_entry invalid_msr_load_area;
};
I'd suggest allocating just one invalid_msr_load_area system-wide, as
mentioned above.
static inline bool is_exception_n(u32 intr_info, u8 vector)
--
2.20.1