> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Sent: Saturday, December 11, 2021 8:11 AM > > On Tue, Dec 07 2021 at 19:03, Yang Zhong wrote: > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > > index 5089f2e7dc22..9811dc98d550 100644 > > --- a/arch/x86/kernel/fpu/core.c > > +++ b/arch/x86/kernel/fpu/core.c > > @@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest > *gfpu) > > fpstate->is_guest = true; > > > > gfpu->fpstate = fpstate; > > + gfpu->xfd_err = XFD_ERR_GUEST_DISABLED; > > This wants to be part of the previous patch, which introduces the field. > > > gfpu->user_xfeatures = fpu_user_cfg.default_features; > > gfpu->user_perm = fpu_user_cfg.default_features; > > fpu_init_guest_permissions(gfpu); > > @@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest > *guest_fpu, bool enter_guest) > > fpu->fpstate = guest_fps; > > guest_fps->in_use = true; > > } else { > > + fpu_save_guest_xfd_err(guest_fpu); > > Hmm. See below. > > > guest_fps->in_use = false; > > fpu->fpstate = fpu->__task_fpstate; > > fpu->__task_fpstate = NULL; > > @@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > > kvm_steal_time_set_preempted(vcpu); > > srcu_read_unlock(&vcpu->kvm->srcu, idx); > > > > + if (vcpu->preempted) > > + fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu); > > I'm not really exited about the thought of an exception cause register > in guest clobbered state. > > Aside of that I really have to ask the question why all this is needed? > > #NM in the guest is slow path, right? So why are you trying to optimize > for it? This is really good information. The current logic is obviously based on the assumption that #NM is frequently triggered. > > The straight forward solution to this is: > > 1) Trap #NM and MSR_XFD_ERR write and #NM vmexit handler should be called in kvm_x86_handle_exit_irqoff() before preemption is enabled, otherwise there is still a small window where MSR_XFD_ERR might be clobbered after preemption enable and before #NM handler is actually called. > > 2) When the guest triggers #NM is takes an VMEXIT and the host > does: > > rdmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > > injects the #NM and goes on. > > 3) When the guest writes to MSR_XFD_ERR it takes an VMEXIT and > the host does: > > vcpu->arch.guest_fpu.xfd_err = msrval; > wrmsrl(MSR_XFD_ERR, msrval); > > and goes back. > > 4) Before entering the preemption disabled section of the VCPU loop > do: > > if (vcpu->arch.guest_fpu.xfd_err) > wrmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err); > > 5) Before leaving the preemption disabled section of the VCPU loop > do: > > if (vcpu->arch.guest_fpu.xfd_err) > wrmsrl(MSR_XFD_ERR, 0); > > It's really that simple and pretty much 0 overhead for the regular case. Much cleaner. > > If the guest triggers #NM with a high frequency then taking the VMEXITs > is the least of the problems. That's not a realistic use case, really. > > Hmm? > > Thanks, > > tglx Thanks Kevin