On 14/10/21 10:02, Liu, Jing2 wrote:
In principle I don't like it very much; it would be nicer to say "you
enable it for QEMU itself via arch_prctl(ARCH_SET_STATE_ENABLE), and for
the guests via ioctl(KVM_SET_CPUID2)". But I can see why you want to
keep things simple, so it's not a strong objection at all.
Does this mean that KVM allocate 3 buffers via
1) Qemu's request, instead of via 2) guest XCR0 trap?
Based on the input from Andy and Thomas, the new way would be like this:
1) host_fpu must always be checked for reallocation in
kvm_load_guest_fpu (or in the FPU functions that it calls, that depends
on the rest of Thomas's patches). That's because arch_prctl can enable
AMX for QEMU at any point after KVM_CREATE_VCPU.
2) every use of vcpu->arch.guest_supported_xcr0 is changed to only
include those dynamic-feature bits that were enabled via arch_prctl.
That is, something like:
static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu)
{
return vcpu->arch.guest_supported_xcr0 &
(~xfeatures_mask_user_dynamic | \
current->thread.fpu.dynamic_state_perm);
}
3) Even with passthrough disabled, the guest can run with XFD set to
vcpu->arch.guest_xfd (and likewise for XFD_ERR) which is much simpler
than trapping #NM. The traps for writing XCR0 and XFD are used to
allocate dynamic state for guest_fpu, and start the passthrough of XFD
and XFD_ERR. What we need is:
- if a dynamic state has XCR0[n]=0, bit n will never be set in XFD_ERR
and the state will never be dirtied by the guest.
- if a dynamic state has XCR0[n]=1, but all enabled dynamic states have
XFD[n]=1, the guest is not able to dirty any dynamic XSAVE state,
because they all have either XCR0[n]=0 or XFD[n]=1. An attempt to do so
will cause an #NM trap and set the bit in XFD_ERR.
- if a dynamic state has XCR0[n]=1 and XFD[n]=0, the state for bit n is
allocated in guest_fpu, and it can also disable the vmexits for XFD and
XFD_ERR.
Therefore:
- if passthrough is disabled, the XCR0 and XFD write traps can check
guest_xcr0 & ~guest_xfd. If it includes a dynamic state bit, dynamic
state is allocated for all bits enabled in guest_xcr0 and passthrough is
started; this should happen shortly after the guest gets its first #NM
trap for AMX.
- if passthrough is enabled, the XCR0 write trap must still ensure that
dynamic state is allocated for all bits enabled in guest_xcr0.
So something like this pseudocode is called by both XCR0 and XFD writes:
int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu)
{
u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm;
u64 enabled_dynamic =
vcpu->arch.xcr0 & xfeatures_mask_user_dynamic;
/* All dynamic features have to be arch_prctl'd first. */
WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic);
if (!vcpu->arch.xfd_passthrough) {
/* All dynamic states will #NM? Wait and see. */
if ((enabled_dynamic & ~vcpu->arch.xfd) == 0)
return 0;
kvm_x86_ops.enable_xfd_passthrough(vcpu);
}
/* current->thread.fpu was already handled by arch_prctl. */
return fpu_alloc_features(vcpu->guest_fpu,
vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic);
}
Paolo