On 12/11/24 02:32, Sean Christopherson wrote:
Fix a hilarious/revolting performance regression (relative to older CPU generations) in xstate_required_size() that pops up due to CPUID _in the host_ taking 3x-4x longer on Emerald Rapids than Skylake. The issue rears its head on nested virtualization transitions, as KVM (unnecessarily) performs runtime CPUID updates, including XSAVE sizes, multiple times per transition. And calculating XSAVE sizes, especially for vCPUs with a decent number of supported XSAVE features and compacted format support, can add up to thousands of cycles. To fix the immediate issue, cache the CPUID output at kvm.ko load. The information is static for a given CPU, i.e. doesn't need to be re-read from hardware every time. That's patch 1, and eliminates pretty much all of the meaningful overhead.
Queued this one, thanks! Paolo