On 08/02/21 19:04, Sean Christopherson wrote:
That said, the case where we saw MSR autoload as faster involved EFER, and
we decided that it was due to TLB flushes (commit f6577a5fa15d, "x86, kvm,
vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you know if
RDMSR/WRMSR is always slower than MSR autoload?
RDMSR/WRMSR may be marginally slower, but only because the autoload stuff avoids
serializing the pipeline after every MSR.
That's probably adding up quickly...
The autoload paths are effectively
just wrappers around the WRMSR ucode, plus some extra VM-Enter specific checks,
as ucode needs to perform all the normal fault checks on the index and value.
On the flip side, if the load lists are dynamically constructed, I suspect the
code overhead of walking the lists negates any advantages of the load lists.
... but yeah this is not very encouraging.
Context switch time is a problem for XFD. In a VM that uses AMX, most
threads in the guest will have nonzero XFD but the vCPU thread itself
will have zero XFD. So as soon as one thread in the VM forces the vCPU
thread to clear XFD, you pay a price on all vmexits and vmentries.
However, running the host with _more_ bits set than necessary in XFD
should not be a problem as long as the host doesn't use the AMX
instructions. So perhaps Jing can look into keeping XFD=0 for as little
time as possible, and XFD=host_XFD|guest_XFD as much as possible.
Paolo