> From: Sean Christopherson <seanjc@xxxxxxxxxx> > Sent: Wednesday, December 29, 2021 8:10 AM > > On Wed, Dec 22, 2021, Jing Liu wrote: > > Guest IA32_XFD_ERR is generally modified in two places: > > > > - Set by CPU when #NM is triggered; > > - Cleared by guest in its #NM handler; > > > > Intercept #NM for the first case, if guest writes XFD as nonzero for > > the first time which indicates guest is possible to use XFD generating > > the exception. #NM is rare if the guest doesn't use dynamic features. > > Otherwise, there is at most one exception per guest task given a > > dynamic feature. > > > > Save the current XFD_ERR value to the guest_fpu container in the #NM > > VM-exit handler. This must be done with interrupt/preemption disabled, > > Assuming my below understanding is correct, drop the "preemption" bit, it's > misleading. code-wise yes. In concept we just want to highlight that this operation must be completed when both interrupt and preemption are disabled. But we can also drop preemption if you prefer to, since preemption is certainly disabled when interrupt is disabled. > > > otherwise the unsaved MSR value may be clobbered by host operations. > > > > Inject a virtual #NM to the guest after saving the MSR value. > > > > Restore the host value (always ZERO outside of the host #NM > > handler) before enabling preemption. > > AIUI, changelog is wrong, code is right. This must be done before _IRQs_ are > enabled, same as handling TIF_NEED_FPU_LOAD. yes > > > Restore the guest value from the guest_fpu container right before > > entering the guest (with preemption disabled). > > Same complaint about preemption. > > > Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > Signed-off-by: Jing Liu <jing2.liu@xxxxxxxxx> > > --- > > arch/x86/include/asm/kvm_host.h | 1 + > > arch/x86/kvm/vmx/vmcs.h | 5 +++++ > > arch/x86/kvm/vmx/vmx.c | 22 +++++++++++++++++++++- > > arch/x86/kvm/x86.c | 6 ++++++ > > 4 files changed, 33 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/include/asm/kvm_host.h > b/arch/x86/include/asm/kvm_host.h > > index 555f4de47ef2..f7a661f35d1a 100644 > > --- a/arch/x86/include/asm/kvm_host.h > > +++ b/arch/x86/include/asm/kvm_host.h > > @@ -640,6 +640,7 @@ struct kvm_vcpu_arch { > > u64 smi_count; > > bool tpr_access_reporting; > > bool xsaves_enabled; > > + bool trap_nm; > > u64 ia32_xss; > > u64 microcode_version; > > u64 arch_capabilities; > > ... > > > @@ -763,6 +764,9 @@ void vmx_update_exception_bitmap(struct > kvm_vcpu *vcpu) > > vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, match); > > } > > > > + if (vcpu->arch.trap_nm) > > + eb |= (1u << NM_VECTOR); > > + > > vmcs_write32(EXCEPTION_BITMAP, eb); > > } > > > > @@ -1960,6 +1964,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, > struct msr_data *msr_info) > > case MSR_KERNEL_GS_BASE: > > vmx_write_guest_kernel_gs_base(vmx, data); > > break; > > + case MSR_IA32_XFD: > > + ret = kvm_set_msr_common(vcpu, msr_info); > > + if (!ret && data) { > > + vcpu->arch.trap_nm = true; > > + vmx_update_exception_bitmap(vcpu); > > This is wrong, it fails to clear vcpu->arch.trap_nm and update the bitmap if > the > MSR is cleared. In concept you are right if just looking at this patch. It's pointless to trap #NM if guest xfd is cleared. But here we need think about patch22 which disables write interception for xfd. With that in consideration we use the 1st non-zero write as the hint indicating that guest might enable xfd-related usages thus always trap #NM after this point. It's not a good ordering, but Paolo wants to put the optimization in the end of this series. But we do need to put a clear comment here explaining the always-trap policy. > > But why even bother with an extra flag? Can't > vmx_update_exception_bitmap() get > the guest's MSR_IA32_XFD value and intercept #NM accordingly? Then you Above is the reason for the extra flag > could > even handle this fully in kvm_set_msr_common(), e.g. > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 2c9606380bca..c6c936d2b298 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3704,6 +3704,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, > struct msr_data *msr_info) > return 1; > > fpu_update_guest_xfd(&vcpu->arch.guest_fpu, data); > + /* Blah blah blah blah */ > + static_call(kvm_x86_update_exception_bitmap)(vcpu); > break; > case MSR_IA32_XFD_ERR: > if (!msr_info->host_initiated && > > > + } > > + break; > > #endif > > case MSR_IA32_SYSENTER_CS: > > if (is_guest_mode(vcpu)) > > @@ -4746,7 +4757,7 @@ static int handle_exception_nmi(struct kvm_vcpu > *vcpu) > > vect_info = vmx->idt_vectoring_info; > > intr_info = vmx_get_intr_info(vcpu); > > > > - if (is_machine_check(intr_info) || is_nmi(intr_info)) > > + if (is_machine_check(intr_info) || is_nmi(intr_info) || > is_nm(intr_info)) > > return 1; /* handled by handle_exception_nmi_irqoff() */ > > > > if (is_invalid_opcode(intr_info)) > > @@ -6350,6 +6361,12 @@ static void handle_interrupt_nmi_irqoff(struct > kvm_vcpu *vcpu, > > kvm_after_interrupt(vcpu); > > } > > > > +static void handle_exception_nm(struct kvm_vcpu *vcpu) > > This needs a different name, it's waaaay too close to the base > handle_exception_nmi(), > which runs with IRQs _on_. And please add "_irqoff" at the end. Maybe > handle_nm_fault_irqoff()? sounds good. > > > +{ > > + rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > > + kvm_queue_exception(vcpu, NM_VECTOR); > > +} > > + > > static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx) > > { > > const unsigned long nmi_entry = (unsigned long)asm_exc_nmi_noist; > > @@ -6358,6 +6375,9 @@ static void handle_exception_nmi_irqoff(struct > vcpu_vmx *vmx) > > /* if exit due to PF check for async PF */ > > if (is_page_fault(intr_info)) > > vmx->vcpu.arch.apf.host_apf_flags = > kvm_read_and_reset_apf_flags(); > > + /* if exit due to NM, handle before preemptions are enabled */ > > + else if (is_nm(intr_info)) > > Same naming complaint about this helper, it looks like an is_nmi() typo. > is_nm_fault()? will fix > > > + handle_exception_nm(&vmx->vcpu); > > /* Handle machine checks before interrupts are enabled */ > > else if (is_machine_check(intr_info)) > > kvm_machine_check();