RE: [patch 13/31] x86/fpu: Move KVMs FPU swapping to FPU core

"Liu, Jing2" <jing2.liu@xxxxxxxxx> · Wed, 13 Oct 2021 06:15:56 +0000

> On 12/10/21 02:00, Thomas Gleixner wrote:
> > Swapping the host/guest FPU is directly fiddling with FPU internals
> > which requires 5 exports. The upcoming support of dymanically enabled
> > states would even need more.
> >
> > Implement a swap function in the FPU core code and export that instead.
> >
> > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > Cc: kvm@xxxxxxxxxxxxxxx
> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> > ---
> >   arch/x86/include/asm/fpu/api.h      |    8 +++++
> >   arch/x86/include/asm/fpu/internal.h |   15 +---------
> >   arch/x86/kernel/fpu/core.c          |   30 ++++++++++++++++++---
> >   arch/x86/kernel/fpu/init.c          |    1
> >   arch/x86/kernel/fpu/xstate.c        |    1
> >   arch/x86/kvm/x86.c                  |   51 +++++++-----------------------------
> >   arch/x86/mm/extable.c               |    2 -
> >   7 files changed, 48 insertions(+), 60 deletions(-)
> >

When looking into the tglx/devel.git x86/fpu for the full #1-#4 
series and the KVM AMX support, I'd like to talk two things
 as follows,

1. KVM dynamic allocation API:
Since KVM also uses dynamic allocation, after KVM detects guest
requesting AMX by #NM trap, KVM need alloc extra buffer for
this vcpu's current->thread.fpu.fpstate and guest_fpu related.
So far, the kernel itself has such API like fpstate_realloc(), but it's
static. How about making a common function usable for KVM?

2. There exists a case that *guest AMX state can be lost*:

After KVM passthrough XFD to guest, when vmexit opening
irq window and KVM is interrupted, kernel softirq path can call
kernel_fpu_begin() to touch xsave state. This function does
XSAVES. If guest XFD[18] is 1, and with guest AMX state in register,
then guest AMX state is lost by XSAVES.

The detailed example call trace in commit
commit 2620fe268e80d667a94553cd37a94ccaa2cb8c83
Author: Sean Christopherson <seanjc@xxxxxxxxxx>
Date:   Fri Jan 17 11:30:51 2020 -0800

    KVM: x86: Revert "KVM: X86: Fix fpu state crash in kvm guest"

    Reload the current thread's FPU state, which contains the guest's FPU
    state, to the CPU registers if necessary during vcpu_enter_guest().
    TIF_NEED_FPU_LOAD can be set any time control is transferred out of
    KVM,
    e.g. if I/O is triggered during a KVM call to get_user_pages() or if a
    softirq occurs while KVM is scheduled in.
    ...
   A sample trace triggered by warning if TIF_NEED_FPU_LOAD is set while
    vcpu state is loaded:

     <IRQ>
      gcmaes_crypt_by_sg.constprop.12+0x26e/0x660
      ? 0xffffffffc024547d
      ? __qdisc_run+0x83/0x510
      ? __dev_queue_xmit+0x45e/0x990
      ...
      ? do_IRQ+0x7f/0xd0
      ? common_interrupt+0xf/0xf
      </IRQ>
      ? irq_entries_start+0x20/0x660
      ? vmx_get_interrupt_shadow+0x2f0/0x710 [kvm_intel]
      ? kvm_set_msr_common+0xfc7/0x2380 [kvm]
      ? recalibrate_cpu_khz+0x10/0x10
      ? ktime_get+0x3a/0xa0
      ? kvm_arch_vcpu_ioctl_run+0x107/0x560 [kvm]
      ? kvm_init+0x6bf/0xd00 [kvm]

For this case, I think one way is kernel doing something before XSAVES
for KVM thread; another way is let KVM fix: maintaining a zero XFD
value (by current->state.fpu.fpstate->xfd = 0) after vcpu fpu state is 
loaded and restore real guest XFD value before vmenter. 
Logic as follows.

after vmexit:
if XFD is passthrough
then
	sync guest XFD to vmx->xfd;
	set XFD to current->state.fpu.fpstate->xfd (= 0)
	__this_cpu_write(xfd_state, 0);

before vmenter (irq is disabled):
if passthrough
then
	restore to real guest XFD by vmx->xfd;

vcpu_run: (if XFD is passthrough)
load: swap from qemu's to a zero XFD
put: swap zero to qemu's

Thanks,
Jing

[...]