Paolo, On Mon, Dec 13 2021 at 20:45, Thomas Gleixner wrote: > On Mon, Dec 13 2021 at 16:06, Paolo Bonzini wrote: >> That said, I think xfd_update_state should not have an argument. >> current->thread.fpu.fpstate->xfd is the only fpstate that should be >> synced with the xfd_state per-CPU variable. > > I'm looking into this right now. The whole restore versus runtime thing > needs to be handled differently. We need to look at different things here: 1) XFD MSR write emulation 2) XFD MSR synchronization when write emulation is disabled 3) Guest restore #1 and #2 are in the context of vcpu_run() and vcpu->arch.guest_fpu.fpstate == current->thread.fpu.fpstate while #3 has: vcpu->arch.guest_fpu.fpstate != current->thread.fpu.fpstate #2 is only updating fpstate->xfd and the per CPU shadow. So the state synchronization wants to be something like this: void fpu_sync_guest_xfd_state(void) { struct fpstate *fps = current->thread.fpu.fpstate; lockdep_assert_irqs_disabled(); if (fpu_state_size_dynamic()) { rdmsrl(MSR_IA32_XFD, fps->xfd); __this_cpu_write(xfd_state, fps->xfd); } } EXPORT_SYMBOL_GPL(fpu_sync_guest_xfd_state); No wrmsrl() because the MSR is already up do date. The important part is that fpstate->xfd and the shadow state are updated so that after reenabling preemption the context switch FPU logic works correctly. #1 and #3 can trigger a reallocation of guest_fpu.fpstate and can fail. But this is also true for XSETBV emulation and XCR0 restore. For #1 modifying fps->xfd in the KVM code before calling into the FPU code is just _wrong_ because if the guest removes the XFD restriction then it must be ensured that the buffer is sized correctly _before_ this is updated. For #3 it's not really important, but I still try to wrap my head around the whole picture vs. XCR0. There are two options: 1) Require strict ordering of XFD and XCR0 update to avoid pointless buffer expansion, i.e. XFD before XCR0. Because if XCR0 is updated while guest_fpu->fpstate.xfd is still in init state (0) and XCR0 contains extended features, then the buffer would be expanded because XFD does not mask the extended features out. When XFD is restored with a non-zero value, it's too late already. 2) Ignore buffer expansion up to the point where XSTATE restore happens and evaluate guest XCR0 and guest_fpu->fpstate.xfd there. I'm leaning towards #1 because that means we have exactly _ONE_ place where we need to deal with buffer expansion. If Qemu gets the ordering wrong it wastes memory per vCPU, *shrug*. Thanks, tglx