On Tue, Oct 1, 2019 at 2:23 PM Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> wrote: > > > > On 10/01/2019 10:09 AM, Jim Mattson wrote: > > On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan > > <krish.sadhukhan@xxxxxxxxxx> wrote: > >> According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are > >> performed in a certain order. Checks on MSRs that are loaded on VM-entry > >> from VM-entry MSR-load area, should be done after verifying VMCS controls, > >> host-state area and guest-state area. As KVM relies on CPU hardware to > >> perform some of these checks, we need to defer VM-exit due to invalid > >> VM-entry MSR-load area to until after CPU hardware completes the earlier > >> checks and is ready to do VMLAUNCH/VMRESUME. > >> > >> In order to defer errors arising from invalid VM-entry MSR-load area in > >> vmcs12, we set up a single invalid entry, which is illegal according to > >> section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of > >> vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure > >> due to MSR loading" after it completes checks on VMCS controls, host-state > >> area and guest-state area. We reflect a synthesized Exit Qualification to > >> our guest. > > This change addresses the priority inversion, but it still potentially > > leaves guest MSRs incorrectly loaded with values from the VMCS12 > > VM-entry MSR-load area when a higher priority error condition would > > have precluded any processing of the VM-entry MSR-load area. > > May be, we should not load any guest MSR until we have checked the > entire vmcs12 MSR-load area in nested_vmx_load_msr() ? That's not sufficient. We shouldn't load any guest MSR until all of the checks on host state, controls, and guest state have passed. Since we defer the guest state checks to hardware, that makes the proper ordering a bit problematic. Instead, you can try to arrange to undo the guest MSR loads if a higher priority error is discovered. That won't be trivial. > > > >> Suggested-by: Jim Mattson <jmattson@xxxxxxxxxx> > >> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > >> Reviewed-by: Mihai Carabas <mihai.carabas@xxxxxxxxxx> > >> Reviewed-by: Liran Alon <liran.alon@xxxxxxxxxx> > >> --- > >> arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++--- > >> arch/x86/kvm/vmx/nested.h | 14 ++++++++++++-- > >> arch/x86/kvm/vmx/vmcs.h | 6 ++++++ > >> 3 files changed, 49 insertions(+), 5 deletions(-) > >> > >> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > >> index ced9fba32598..b74491c04090 100644 > >> --- a/arch/x86/kvm/vmx/nested.c > >> +++ b/arch/x86/kvm/vmx/nested.c > >> @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) > >> goto vmentry_fail_vmexit_guest_mode; > >> > >> if (from_vmentry) { > >> - exit_reason = EXIT_REASON_MSR_LOAD_FAIL; > >> exit_qual = nested_vmx_load_msr(vcpu, > >> vmcs12->vm_entry_msr_load_addr, > >> vmcs12->vm_entry_msr_load_count); > >> - if (exit_qual) > >> - goto vmentry_fail_vmexit_guest_mode; > >> + if (exit_qual) { > >> + /* > >> + * According to section “VM Entries” in Intel SDM > >> + * vol 3C, VM-entry checks are performed in a certain > >> + * order. Checks on MSRs that are loaded on VM-entry > >> + * from VM-entry MSR-load area, should be done after > >> + * verifying VMCS controls, host-state area and > >> + * guest-state area. As KVM relies on CPU hardware to > >> + * perform some of these checks, we need to defer > >> + * VM-exit due to invalid VM-entry MSR-load area to > >> + * until after CPU hardware completes the earlier > >> + * checks and is ready to do VMLAUNCH/VMRESUME. > >> + * > >> + * In order to defer errors arising from invalid > >> + * VM-entry MSR-load area in vmcs12, we set up a > >> + * single invalid entry, which is illegal according > >> + * to section "Loading MSRs in Intel SDM vol 3C, in > >> + * VM-entry MSR-load area of vmcs02. This will cause > >> + * the CPU hardware to VM-exit with "VM-entry > >> + * failure due to MSR loading" after it completes > >> + * checks on VMCS controls, host-state area and > >> + * guest-state area. > >> + */ > >> + vmx->loaded_vmcs->invalid_msr_load_area.index = > >> + MSR_FS_BASE; > > Can this field be statically populated during initialization? > > Yes. > > > > >> + vmx->loaded_vmcs->invalid_msr_load_area.value = > >> + exit_qual; > > This seems awkward. Why not save 16 bytes per loaded_vmcs by > > allocating one invalid_msr_load_area system-wide and just add a 4 byte > > field to struct nested_vmx to store this value? > > OK. > > > > >> + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1); > >> + vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, > >> + __pa(&(vmx->loaded_vmcs->invalid_msr_load_area))); > >> + } > > Do you need to set vmx->nested.dirty_vmcs12 to ensure that > > prepare_vmcs02_constant_state() will be called at the next emulated > > VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR? > > Yes. > > > > >> } else { > >> /* > >> * The MMU is not initialized to point at the right entities yet and > >> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > >> index 187d39bf0bf1..f3a384235b68 100644 > >> --- a/arch/x86/kvm/vmx/nested.h > >> +++ b/arch/x86/kvm/vmx/nested.h > >> @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu) > >> static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu, > >> u32 exit_reason) > >> { > >> + u32 exit_qual; > >> u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); > >> + struct vcpu_vmx *vmx = to_vmx(vcpu); > >> > >> /* > >> * At this point, the exit interruption info in exit_intr_info > >> @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu, > >> vmcs_read32(VM_EXIT_INTR_ERROR_CODE); > >> } > >> > >> - nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, > >> - vmcs_readl(EXIT_QUALIFICATION)); > >> + exit_qual = vmcs_readl(EXIT_QUALIFICATION); > >> + > >> + if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE && > >> + (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY | > >> + EXIT_REASON_MSR_LOAD_FAIL))) { > > Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if > > the second conjunct is true and the first is not? > > If the first conjunct isn't true and the second one is, it means it's a > hardware-detected MSR-load error. Right ? So won't that be handled the > same way it is handled currently in KVM ? I believe that KVM will currently forward such an error to L1, which is wrong, since L1 has no control over vmcs02's VM-entry MSR-load area. The assumption is that this will never happen, because L0 is too careful. > > > >> + exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value; > >> + } > >> + > >> + nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual); > >> + > >> return 1; > >> } > >> > >> diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h > >> index 481ad879197b..e272788bd4b8 100644 > >> --- a/arch/x86/kvm/vmx/vmcs.h > >> +++ b/arch/x86/kvm/vmx/vmcs.h > >> @@ -70,6 +70,12 @@ struct loaded_vmcs { > >> struct list_head loaded_vmcss_on_cpu_link; > >> struct vmcs_host_state host_state; > >> struct vmcs_controls_shadow controls_shadow; > >> + /* > >> + * This field is used to set up an invalid VM-entry MSR-load area > >> + * for vmcs02 if an error is detected while processing the entries > >> + * in VM-entry MSR-load area of vmcs12. > >> + */ > >> + struct vmx_msr_entry invalid_msr_load_area; > >> }; > > I'd suggest allocating just one invalid_msr_load_area system-wide, as > > mentioned above. > > > >> static inline bool is_exception_n(u32 intr_info, u8 vector) > >> -- > >> 2.20.1 > >> >