On Thu, Oct 14 2021 at 11:01, Paolo Bonzini wrote: > On 14/10/21 10:02, Liu, Jing2 wrote: > Based on the input from Andy and Thomas, the new way would be like this: > > 1) host_fpu must always be checked for reallocation in > kvm_load_guest_fpu (or in the FPU functions that it calls, that depends > on the rest of Thomas's patches). That's because arch_prctl can enable > AMX for QEMU at any point after KVM_CREATE_VCPU. No. 1) QEMU starts 2) QEMU requests permissions via prctl() 3) QEMU creates vCPU threads Doing it the other way around makes no sense at all and wont work. > 2) every use of vcpu->arch.guest_supported_xcr0 is changed to only > include those dynamic-feature bits that were enabled via arch_prctl. > That is, something like: > > static u64 kvm_guest_supported_cr0(struct kvm_vcpu *vcpu) > { > return vcpu->arch.guest_supported_xcr0 & > (~xfeatures_mask_user_dynamic | \ > current->thread.fpu.dynamic_state_perm); Bah. You can't get enough from poking in internals, right? vcpu_create() fpu_init_fpstate_user(guest_fpu, supported_xcr0) That will (it does not today) do: guest_fpu::__state_perm = supported_xcr0 & xstate_get_group_perm(); for you. Once. The you have the information you need right in the guest FPU. See? > So something like this pseudocode is called by both XCR0 and XFD writes: > > int kvm_alloc_fpu_dynamic_features(struct kvm_vcpu *vcpu) > { > u64 allowed_dynamic = current->thread.fpu.dynamic_state_perm; That's not a valid assumption. > u64 enabled_dynamic = > vcpu->arch.xcr0 & xfeatures_mask_user_dynamic; > > /* All dynamic features have to be arch_prctl'd first. */ > WARN_ON_ONCE(enabled_dynamic & ~allowed_dynamic); > > if (!vcpu->arch.xfd_passthrough) { > /* All dynamic states will #NM? Wait and see. */ > if ((enabled_dynamic & ~vcpu->arch.xfd) == 0) > return 0; > > kvm_x86_ops.enable_xfd_passthrough(vcpu); > } > > /* current->thread.fpu was already handled by arch_prctl. */ No. current->thread.fpu has the default buffer unless QEMU used AMX or something forced it to allocate a larger buffer. > return fpu_alloc_features(vcpu->guest_fpu, > vcpu->guest_fpu.dynamic_state_perm | enabled_dynamic); This unconditionally calls into that allocation for every XCR0/XFD trap ? > } Also you really should not wait until _all_ dynamic states are cleared in guest XFD. Because a guest which has bit 18 and 19 available but only uses one of them is going to trap on every other context switch due to XFD writes. So you check for (guest_xfd & guest_perm) != guest_perm) and (guest_xr0 & guest_perm) != 0 If both are true, then you reallocate the buffers for _all_ permitted states _and_ set XFD to pass through. Thanks, tglx