Re: [PATCH 15/19] kvm: x86: Save and restore guest XFD_ERR properly

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Sat, 11 Dec 2021 01:10:47 +0100

On Tue, Dec 07 2021 at 19:03, Yang Zhong wrote:
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 5089f2e7dc22..9811dc98d550 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -238,6 +238,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_guest *gfpu)
>  	fpstate->is_guest	= true;
>  
>  	gfpu->fpstate		= fpstate;
> +	gfpu->xfd_err           = XFD_ERR_GUEST_DISABLED;

This wants to be part of the previous patch, which introduces the field.

>  	gfpu->user_xfeatures	= fpu_user_cfg.default_features;
>  	gfpu->user_perm		= fpu_user_cfg.default_features;
>  	fpu_init_guest_permissions(gfpu);
> @@ -297,6 +298,7 @@ int fpu_swap_kvm_fpstate(struct fpu_guest *guest_fpu, bool enter_guest)
>  		fpu->fpstate = guest_fps;
>  		guest_fps->in_use = true;
>  	} else {
> +		fpu_save_guest_xfd_err(guest_fpu);

Hmm. See below.

>  		guest_fps->in_use = false;
>  		fpu->fpstate = fpu->__task_fpstate;
>  		fpu->__task_fpstate = NULL;
> @@ -4550,6 +4550,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  		kvm_steal_time_set_preempted(vcpu);
>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  
> +	if (vcpu->preempted)
> +		fpu_save_guest_xfd_err(&vcpu->arch.guest_fpu);

I'm not really exited about the thought of an exception cause register
in guest clobbered state.

Aside of that I really have to ask the question why all this is needed?

#NM in the guest is slow path, right? So why are you trying to optimize
for it?

The straight forward solution to this is:

    1) Trap #NM and MSR_XFD_ERR write

    2) When the guest triggers #NM is takes an VMEXIT and the host
       does:

                rdmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);

       injects the #NM and goes on.

    3) When the guest writes to MSR_XFD_ERR it takes an VMEXIT and
       the host does:

           vcpu->arch.guest_fpu.xfd_err = msrval;
           wrmsrl(MSR_XFD_ERR, msrval);

      and goes back.

    4) Before entering the preemption disabled section of the VCPU loop
       do:

           if (vcpu->arch.guest_fpu.xfd_err)
                      wrmsrl(MSR_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);

    5) Before leaving the preemption disabled section of the VCPU loop
       do:

           if (vcpu->arch.guest_fpu.xfd_err)
                      wrmsrl(MSR_XFD_ERR, 0);

It's really that simple and pretty much 0 overhead for the regular case.

If the guest triggers #NM with a high frequency then taking the VMEXITs
is the least of the problems. That's not a realistic use case, really.

Hmm?

Thanks,

        tglx