RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

"Liu, Jing2" <jing2.liu@xxxxxxxxx> · Thu, 14 Oct 2021 11:30:46 +0000

On 10/14/2021 5:01 PM, Paolo Bonzini wrote:
> On 14/10/21 10:02, Liu, Jing2 wrote:
> >> In principle I don't like it very much; it would be nicer to say "you
> >> enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and
> >> for the guests via ioctl(KVM_SET_CPUID2)".  But I can see why you
> >> want to keep things simple, so it's not a strong objection at all.
> >
> > Does this mean that KVM allocate 3 buffers via
> > 1) Qemu's request, instead of via 2) guest XCR0 trap?
> 
> Based on the input from Andy and Thomas, the new way would be like this:
> 
> 1) host_fpu must always be checked for reallocation in kvm_load_guest_fpu
> (or in the FPU functions that it calls, that depends on the rest of Thomas's
> patches).  That's because arch_prctl can enable AMX for QEMU at any point
> after KVM_CREATE_VCPU.
> 
> 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only include
> those dynamic-feature bits that were enabled via arch_prctl.
> That is, something like:
> 
> static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu) {
> 	return vcpu->arch.guest_supported_xcr0 &
> 		(~xfeatures_mask_user_dynamic | \
> 		 current->thread.fpu.dynamic_state_perm);
> }
> 
> 3) Even with passthrough disabled, the guest can run with XFD set to
> vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler
> than trapping #NM.  The traps for writing XCR0 and XFD are used to allocate
> dynamic state for guest_fpu, and start the passthrough of XFD and XFD_ERR.

For XFD_ERR, since it can be auto changed by HW, write-protect is not
need I think. KVM also not need trap rdmsr of it because no use.

I guess we're worrying about is when KVM is sched_out, a nonzero XFD_ERR
can be lost by other host thread. We can save guest XFD_ERR in sched_out
and restore before next vmenter. Kernel is assumed not using AMX thus
softirq won't make it lost.
I think this solves the problem. So we can directly passthrough RW of it,
and no need to rdmsr(XFD_ERR) in vmexit.

Thanks,
Jing