On Tue, 10 Dec 2024 17:32:57 -0800, Sean Christopherson wrote: > Fix a hilarious/revolting performance regression (relative to older CPU > generations) in xstate_required_size() that pops up due to CPUID _in the > host_ taking 3x-4x longer on Emerald Rapids than Skylake. > > The issue rears its head on nested virtualization transitions, as KVM > (unnecessarily) performs runtime CPUID updates, including XSAVE sizes, > multiple times per transition. And calculating XSAVE sizes, especially > for vCPUs with a decent number of supported XSAVE features and compacted > format support, can add up to thousands of cycles. > > [...] Applied 2-5 to kvm-x86 misc, with a changelog that doesn't incorrectly state that CPUID is a mandatory intercept on AMD. [1/5] KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init (no commit info) [2/5] KVM: x86: Use for-loop to iterate over XSTATE size entries https://github.com/kvm-x86/linux/commit/aa93b6f96f64 [3/5] KVM: x86: Apply TSX_CTRL_CPUID_CLEAR if and only if the vCPU has RTM or HLE https://github.com/kvm-x86/linux/commit/7e9f735e7ac4 [4/5] KVM: x86: Query X86_FEATURE_MWAIT iff userspace owns the CPUID feature bit https://github.com/kvm-x86/linux/commit/a487f6797c88 [5/5] KVM: x86: Defer runtime updates of dynamic CPUID bits until CPUID emulation https://github.com/kvm-x86/linux/commit/93da6af3ae56 -- https://github.com/kvm-x86/linux/tree/next