On Mon, Mar 15, 2021 at 09:45:11AM -0700, Sean Christopherson wrote: > On Mon, Mar 15, 2021, Yang Weijiang wrote: > > These fields are rarely updated by L1 QEMU/KVM, sync them when L1 is trying to > > read/write them and after they're changed. If CET guest entry-load bit is not > > set by L1 guest, migrate them to L2 manaully. > > > > Opportunistically remove one blank line and add minor fix for MPX. > > > > Suggested-by: Sean Christopherson <seanjc@xxxxxxxxxx> > > Signed-off-by: Yang Weijiang <weijiang.yang@xxxxxxxxx> > > --- > > arch/x86/kvm/cpuid.c | 1 - > > arch/x86/kvm/vmx/nested.c | 35 +++++++++++++++++++++++++++++++++-- > > arch/x86/kvm/vmx/vmx.h | 3 +++ > > 3 files changed, 36 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index d191de769093..8692f53b8cd0 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -143,7 +143,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) > > } > > vcpu->arch.guest_supported_xss = > > (((u64)best->edx << 32) | best->ecx) & supported_xss; > > - > > Spurious whitespace deletion. Yes, Opportunistically did it as said in commit log :-D > > > } else { > > vcpu->arch.guest_supported_xss = 0; > > } > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > index 9728efd529a1..57ecd8225568 100644 > > --- a/arch/x86/kvm/vmx/nested.c > > +++ b/arch/x86/kvm/vmx/nested.c > > @@ -2516,6 +2516,13 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) > > vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); > > > > set_cr4_guest_host_mask(vmx); > > + > > + if (kvm_cet_supported() && vmx->nested.nested_run_pending && > > + (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) { > > + vmcs_writel(GUEST_SSP, vmcs12->guest_ssp); > > + vmcs_writel(GUEST_S_CET, vmcs12->guest_s_cet); > > + vmcs_writel(GUEST_INTR_SSP_TABLE, vmcs12->guest_ssp_tbl); > > + } > > } > > > > /* > > @@ -2556,6 +2563,15 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > > if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || > > !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) > > vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs); > > + > > + if (kvm_cet_supported() && (!vmx->nested.nested_run_pending || > > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))) { > > Not your code per se, since this pattern comes from BNDCFGS and DEBUGCTL, but I > don't see how loading vmcs01 state in this combo is correct: > > a. kvm_xxx_supported() == 1 > b. nested_run_pending == false > c. vm_entry_controls.load_xxx_state == true > > nested_vmx_enter_non_root_mode() only snapshots vmcs01 if > vm_entry_controls.load_xxx_state == false, which means the above combo is > loading stale values (or more likely, zeros). > > I _think_ nested_vmx_enter_non_root_mode() just needs to snapshot vmcs01 if > nested_run_pending=false. For migration, if userspace restores MSRs after > KVM_SET_NESTED_STATE, then what's done here is likely irrelevant. If userspace > restores MSRs before nested state, then vmcs01 will hold the desired value since > setting MSRs would have written the value into vmcs01. Then the code nested_vmx_enter_non_root_mode() would look like: if (kvm_cet_supported() && !vmx->nested.nested_run_pending && !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) { ... } I have another concern now, if vm_entry_controls.load_cet_state == false, and L1 updated vmcs fields, so the latest states are in vmcs12, but they cannot be synced to vmcs02 because in prepare_vmcs02_rare(): if (kvm_cet_supported() && vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) { ... } so L2 got stale status. IMO, L1 guest sets vm_entry_controls.load_cet_state == false should be rare case. We event can igore this case :-) > > I suspect no one has reported this issue because guests simply don't use MPX, > and up until the recent LBR stuff, KVM effectively zeroed out DEBUGCTL for the > guest. > So for MPX and DEBUGCTL, is it worth some separate fix patch? > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 45622e9c4449..4184ff601120 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -3298,10 +3298,11 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > if (likely(!evaluate_pending_interrupts) && kvm_vcpu_apicv_active(vcpu)) > evaluate_pending_interrupts |= vmx_has_apicv_interrupt(vcpu); > > - if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) > + if (!vmx->nested.nested_run_pending || > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) > vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); > - if (kvm_mpx_supported() && > - !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > + if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) > vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > > /* > > > Side topic, all of this code is broken for SMM emulation. SMI+RSM don't do a > full VM-Exit -> VM-Entry; the CPU forcefully exits non-root, but most state that > is loaded from the VMCS is left untouched. It's the SMI handler's responsibility > to not enable features, e.g. to not set CR4.CET. For sane use cases, this > probably doesn't matter as vmcs12 will be configured to context switch state, > but if L1 is doing anything out of the ordinary, SMI+RSM will corrupt state. > > E.g. if L1 enables MPX in the guest, does not intercept L2 writes to BNDCFGS, > and does not load BNDCFGS on VM-Entry, then SMI+RSM would corrupt BNDCFGS since > the SMI "exit" would clear BNDCFGS, and the RSM "entry" would load zero. This > is 100% contrived, and probably doesn't impact real world use cases, but it > still bugs me :-) Exactly, should it be fixed by separate patch or leave it as is? > > > + vmcs_writel(GUEST_SSP, vmx->nested.vmcs01_guest_ssp); > > + vmcs_writel(GUEST_S_CET, vmx->nested.vmcs01_guest_s_cet); > > + vmcs_writel(GUEST_INTR_SSP_TABLE, > > + vmx->nested.vmcs01_guest_ssp_tbl); > > + } > > + > > vmx_set_rflags(vcpu, vmcs12->guest_rflags); > > > > /* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the > > @@ -3373,8 +3389,14 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, > > if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) > > vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); > > if (kvm_mpx_supported() && > > - !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)) > > vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > > + if (kvm_cet_supported() && > > + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)) { > > + vmx->nested.vmcs01_guest_ssp = vmcs_readl(GUEST_SSP); > > + vmx->nested.vmcs01_guest_s_cet = vmcs_readl(GUEST_S_CET); > > + vmx->nested.vmcs01_guest_ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE); > > + } > > > > /* > > * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and* > > @@ -4001,6 +4023,9 @@ static bool is_vmcs12_ext_field(unsigned long field) > > case GUEST_IDTR_BASE: > > case GUEST_PENDING_DBG_EXCEPTIONS: > > case GUEST_BNDCFGS: > > + case GUEST_SSP: > > + case GUEST_INTR_SSP_TABLE: > > + case GUEST_S_CET: > > return true; > > default: > > break; > > @@ -4050,8 +4075,14 @@ static void sync_vmcs02_to_vmcs12_rare(struct kvm_vcpu *vcpu, > > vmcs12->guest_idtr_base = vmcs_readl(GUEST_IDTR_BASE); > > vmcs12->guest_pending_dbg_exceptions = > > vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); > > - if (kvm_mpx_supported()) > > + if (kvm_mpx_supported() && guest_cpuid_has(vcpu, X86_FEATURE_MPX)) > > Adding the CPUID check for MPX definitely needs to be a separate commit. Sure, will fix them by separate patch. Thanks for review! > > > vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); > > + if (kvm_cet_supported() && (guest_cpuid_has(vcpu, X86_FEATURE_SHSTK) || > > + guest_cpuid_has(vcpu, X86_FEATURE_IBT))) { > > + vmcs12->guest_ssp = vmcs_readl(GUEST_SSP); > > + vmcs12->guest_s_cet = vmcs_readl(GUEST_S_CET); > > + vmcs12->guest_ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE); > > + } > > > > vmx->nested.need_sync_vmcs02_to_vmcs12_rare = false; > > } > > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h > > index 9d3a557949ac..36dc4fdb0909 100644 > > --- a/arch/x86/kvm/vmx/vmx.h > > +++ b/arch/x86/kvm/vmx/vmx.h > > @@ -155,6 +155,9 @@ struct nested_vmx { > > /* to migrate it to L2 if VM_ENTRY_LOAD_DEBUG_CONTROLS is off */ > > u64 vmcs01_debugctl; > > u64 vmcs01_guest_bndcfgs; > > + u64 vmcs01_guest_ssp; > > + u64 vmcs01_guest_s_cet; > > + u64 vmcs01_guest_ssp_tbl; > > > > /* to migrate it to L1 if L2 writes to L1's CR8 directly */ > > int l1_tpr_threshold; > > -- > > 2.26.2 > >