On 13/10/21 10:42, Paolo Bonzini wrote: > On 13/10/21 09:46, Liu, Jing2 wrote: > > > >> On 13/10/21 08:15, Liu, Jing2 wrote: > >>> After KVM passthrough XFD to guest, when vmexit opening irq window > >>> and KVM is interrupted, kernel softirq path can call > >>> kernel_fpu_begin() to touch xsave state. This function does XSAVES. > >>> If guest XFD[18] is 1, and with guest AMX state in register, then > >>> guest AMX state is lost by XSAVES. > >> > >> Yes, the host value of XFD (which is zero) has to be restored after vmexit. > >> See how KVM already handles SPEC_CTRL. > > > > I'm trying to understand why qemu's XFD is zero after kernel supports AMX. > > There are three copies of XFD: > > - the guest value stored in vcpu->arch. OK, let's call it e.g. vcpu->arch.xfd [...] > - the internal KVM value attached to guest_fpu. When #NM happens, this > one becomes zero. > The CPU value is: > > - the guest_fpu value between kvm_load_guest_fpu and kvm_put_guest_fpu. > This ensures that no state is lost in the case you are describing. > OK, you mean using guest_fpu as a KVM value. Let me describe the flow to see if anything missing. When #NM trap which makes passthrough, guest_fpu XFD set to 0 and keeps forever. (don't change HW XFD which is still 1) In the #NM trap, KVM alloc buffer and regenerate a #NM exception to guest to make guest kernel alloc its thread buffer. Then in next vmexit, KVM sync vcpu->arch.xfd, load guest_fpu value (=0) and update current->thread.fpu XFD to 0 for kernel reference. > - the OR of the guest value and the guest_fpu value while the guest runs > (using either MSR load/save lists, or manual wrmsr like > pt_guest_enter/pt_guest_exit). This ensures that the host has the > opportunity to get a #NM exception, and allocate AMX state in the > guest_fpu and in current->thread.fpu. > > > Yes, passthrough is done by two cases: one is guest #NM trapped; > > another is guest clearing XFD before it generates #NM (this is possible for > > guest), then passthrough. > > For the two cases, we passthrough and allocate buffer for guest_fpu, and > > current->thread.fpu. > > I think it's simpler to always wait for #NM, it will only happen once > per vCPU. In other words, even if the guest clears XFD before it > generates #NM, the guest_fpu's XFD remains nonzero You mean a wrmsr trap doesn't do anything and return back? In this case, when next vmenter, the OR of the guest value (vcpu->arch.xfd) and the guest_fpu value is still 1, so this doesn't obey guest's HW assumption? (guest finds the wrmsr didn't work) Thanks, Jing and an #NM vmexit is > possible. After #NM the guest_fpu's XFD is zero; then passthrough can > happen and the #NM vmexit trap can be disabled. > > Paolo