On Mon, Sep 30, 2019 at 5:12 PM Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> wrote: > > According to section “VM Entries” in Intel SDM vol 3C, VM-entry checks are > performed in a certain order. Checks on MSRs that are loaded on VM-entry > from VM-entry MSR-load area, should be done after verifying VMCS controls, > host-state area and guest-state area. As KVM relies on CPU hardware to > perform some of these checks, we need to defer VM-exit due to invalid > VM-entry MSR-load area to until after CPU hardware completes the earlier > checks and is ready to do VMLAUNCH/VMRESUME. > > In order to defer errors arising from invalid VM-entry MSR-load area in > vmcs12, we set up a single invalid entry, which is illegal according to > section "Loading MSRs in Intel SDM vol 3C, in VM-entry MSR-load area of > vmcs02. This will cause the CPU hardware to VM-exit with "VM-entry failure > due to MSR loading" after it completes checks on VMCS controls, host-state > area and guest-state area. We reflect a synthesized Exit Qualification to > our guest. This change addresses the priority inversion, but it still potentially leaves guest MSRs incorrectly loaded with values from the VMCS12 VM-entry MSR-load area when a higher priority error condition would have precluded any processing of the VM-entry MSR-load area. > Suggested-by: Jim Mattson <jmattson@xxxxxxxxxx> > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > Reviewed-by: Mihai Carabas <mihai.carabas@xxxxxxxxxx> > Reviewed-by: Liran Alon <liran.alon@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/nested.c | 34 +++++++++++++++++++++++++++++++--- > arch/x86/kvm/vmx/nested.h | 14 ++++++++++++-- > arch/x86/kvm/vmx/vmcs.h | 6 ++++++ > 3 files changed, 49 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index ced9fba32598..b74491c04090 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -3054,12 +3054,40 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry) > goto vmentry_fail_vmexit_guest_mode; > > if (from_vmentry) { > - exit_reason = EXIT_REASON_MSR_LOAD_FAIL; > exit_qual = nested_vmx_load_msr(vcpu, > vmcs12->vm_entry_msr_load_addr, > vmcs12->vm_entry_msr_load_count); > - if (exit_qual) > - goto vmentry_fail_vmexit_guest_mode; > + if (exit_qual) { > + /* > + * According to section “VM Entries” in Intel SDM > + * vol 3C, VM-entry checks are performed in a certain > + * order. Checks on MSRs that are loaded on VM-entry > + * from VM-entry MSR-load area, should be done after > + * verifying VMCS controls, host-state area and > + * guest-state area. As KVM relies on CPU hardware to > + * perform some of these checks, we need to defer > + * VM-exit due to invalid VM-entry MSR-load area to > + * until after CPU hardware completes the earlier > + * checks and is ready to do VMLAUNCH/VMRESUME. > + * > + * In order to defer errors arising from invalid > + * VM-entry MSR-load area in vmcs12, we set up a > + * single invalid entry, which is illegal according > + * to section "Loading MSRs in Intel SDM vol 3C, in > + * VM-entry MSR-load area of vmcs02. This will cause > + * the CPU hardware to VM-exit with "VM-entry > + * failure due to MSR loading" after it completes > + * checks on VMCS controls, host-state area and > + * guest-state area. > + */ > + vmx->loaded_vmcs->invalid_msr_load_area.index = > + MSR_FS_BASE; Can this field be statically populated during initialization? > + vmx->loaded_vmcs->invalid_msr_load_area.value = > + exit_qual; This seems awkward. Why not save 16 bytes per loaded_vmcs by allocating one invalid_msr_load_area system-wide and just add a 4 byte field to struct nested_vmx to store this value? > + vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, 1); > + vmcs_write64(VM_ENTRY_MSR_LOAD_ADDR, > + __pa(&(vmx->loaded_vmcs->invalid_msr_load_area))); > + } Do you need to set vmx->nested.dirty_vmcs12 to ensure that prepare_vmcs02_constant_state() will be called at the next emulated VM-entry, to undo this change to VM_ENTRY_MSR_LOAD_ADDR? > } else { > /* > * The MMU is not initialized to point at the right entities yet and > diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h > index 187d39bf0bf1..f3a384235b68 100644 > --- a/arch/x86/kvm/vmx/nested.h > +++ b/arch/x86/kvm/vmx/nested.h > @@ -64,7 +64,9 @@ static inline bool nested_ept_ad_enabled(struct kvm_vcpu *vcpu) > static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu, > u32 exit_reason) > { > + u32 exit_qual; > u32 exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO); > + struct vcpu_vmx *vmx = to_vmx(vcpu); > > /* > * At this point, the exit interruption info in exit_intr_info > @@ -81,8 +83,16 @@ static inline int nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu, > vmcs_read32(VM_EXIT_INTR_ERROR_CODE); > } > > - nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, > - vmcs_readl(EXIT_QUALIFICATION)); > + exit_qual = vmcs_readl(EXIT_QUALIFICATION); > + > + if (vmx->loaded_vmcs->invalid_msr_load_area.index == MSR_FS_BASE && > + (exit_reason == (VMX_EXIT_REASONS_FAILED_VMENTRY | > + EXIT_REASON_MSR_LOAD_FAIL))) { Is the second conjunct sufficient? i.e. Isn't there a bug in kvm if the second conjunct is true and the first is not? > + exit_qual = vmx->loaded_vmcs->invalid_msr_load_area.value; > + } > + > + nested_vmx_vmexit(vcpu, exit_reason, exit_intr_info, exit_qual); > + > return 1; > } > > diff --git a/arch/x86/kvm/vmx/vmcs.h b/arch/x86/kvm/vmx/vmcs.h > index 481ad879197b..e272788bd4b8 100644 > --- a/arch/x86/kvm/vmx/vmcs.h > +++ b/arch/x86/kvm/vmx/vmcs.h > @@ -70,6 +70,12 @@ struct loaded_vmcs { > struct list_head loaded_vmcss_on_cpu_link; > struct vmcs_host_state host_state; > struct vmcs_controls_shadow controls_shadow; > + /* > + * This field is used to set up an invalid VM-entry MSR-load area > + * for vmcs02 if an error is detected while processing the entries > + * in VM-entry MSR-load area of vmcs12. > + */ > + struct vmx_msr_entry invalid_msr_load_area; > }; I'd suggest allocating just one invalid_msr_load_area system-wide, as mentioned above. > static inline bool is_exception_n(u32 intr_info, u8 vector) > -- > 2.20.1 >