On 08/02/21 18:31, Sean Christopherson wrote:
On Mon, Feb 08, 2021, Paolo Bonzini wrote:
On 07/02/21 16:42, Jing Liu wrote:
In KVM, "guest_fpu" serves for any guest task working on this vcpu
during vmexit and vmenter. We provide a pre-allocated guest_fpu space
and entire "guest_fpu.state_mask" to avoid each dynamic features
detection on each vcpu task. Meanwhile, to ensure correctly
xsaves/xrstors guest state, set IA32_XFD as zero during vmexit and
vmenter.
Most guests will not need the whole xstate feature set. So perhaps you
could set XFD to the host value | the guest value, trap #NM if the host XFD
is zero, and possibly reflect the exception to the guest's XFD and XFD_ERR.
In addition, loading the guest XFD MSRs should use the MSR autoload feature
(add_atomic_switch_msr).
Why do you say that? I would strongly prefer to use the load lists only if they
are absolutely necessary. I don't think that's the case here, as I can't
imagine accessing FPU state in NMI context is allowed, at least not without a
big pile of save/restore code.
I was thinking more of the added vmentry/vmexit overhead due to
xfd_guest_enter xfd_guest_exit.
That said, the case where we saw MSR autoload as faster involved EFER,
and we decided that it was due to TLB flushes (commit f6577a5fa15d,
"x86, kvm, vmx: Always use LOAD_IA32_EFER if available", 2014-11-12).
Do you know if RDMSR/WRMSR is always slower than MSR autoload?
Paolo