On Mon, Feb 08, 2021 at 07:12:22PM +0100, Paolo Bonzini wrote: > On 08/02/21 19:04, Sean Christopherson wrote: > > > That said, the case where we saw MSR autoload as faster involved EFER, and > > > we decided that it was due to TLB flushes (commit f6577a5fa15d, "x86, kvm, > > > vmx: Always use LOAD_IA32_EFER if available", 2014-11-12). Do you know if > > > RDMSR/WRMSR is always slower than MSR autoload? > > RDMSR/WRMSR may be marginally slower, but only because the autoload stuff avoids > > serializing the pipeline after every MSR. > > That's probably adding up quickly... > > > The autoload paths are effectively > > just wrappers around the WRMSR ucode, plus some extra VM-Enter specific checks, > > as ucode needs to perform all the normal fault checks on the index and value. > > On the flip side, if the load lists are dynamically constructed, I suspect the > > code overhead of walking the lists negates any advantages of the load lists. > > ... but yeah this is not very encouraging. > > Context switch time is a problem for XFD. In a VM that uses AMX, most > threads in the guest will have nonzero XFD but the vCPU thread itself will > have zero XFD. So as soon as one thread in the VM forces the vCPU thread to > clear XFD, you pay a price on all vmexits and vmentries. > > However, running the host with _more_ bits set than necessary in XFD should > not be a problem as long as the host doesn't use the AMX instructions. So > perhaps Jing can look into keeping XFD=0 for as little time as possible, and > XFD=host_XFD|guest_XFD as much as possible. This sounds like the lazy-fpu (eagerfpu?) that used to be part of the kernel? I recall that we had a CVE for that - so it may also be worth double-checking that we don't reintroduce that one.