On Wed, Dec 11, 2024 at 09:16:11AM -0800, Sean Christopherson wrote: > On Tue, Dec 10, 2024, Bernhard Kauer wrote: > > On Mon, Dec 09, 2024 at 05:40:48PM -0800, Sean Christopherson wrote: > > > > With a single vCPU pinned to a single pCPU, the average latency for a CPUID exit > > > > goes from 1018 => 1027 cycles, plus or minus a few. With 8 vCPUs, no pinning > > > > (mostly laziness), the average latency goes from 1034 => 1053. > > > > Are these kind of benchmarks tracked somewhere automatically? > > I'm not sure what you're asking. The benchmark is KVM-Unit-Test's[*] CPUID test, > e.g. "./x86/run x86/vmexit.flat -smp 1 -append 'cpuid'". There are various issues with these benchmarks. 1. The absolute numbers depend on the particular CPU. My results can't be compared to your absolute results. 2. They have a 1% accuracy when warming up and pinning to a CPU. Thus one has to do multiple runs. 1 cpuid 1087 1 cpuid 1092 5 cpuid 1093 4 cpuid 1094 3 cpuid 1095 11 cpuid 1096 8 cpuid 1097 24 cpuid 1098 11 cpuid 1099 17 cpuid 1100 8 cpuid 1101 1 cpuid 1102 4 cpuid 1103 1 cpuid 1104 1 cpuid 1110 3. Dynamic Frequency scaling makes it even more inaccurate. A previously idle CPU can be as low as 1072 cycles and without pinning even 1050 cycles. This 2.4% and 4.6% faster than the 1098 median. 4. Patches that seem not to be worth checking for or where the impact is smaller than measurement uncertainties might make the system slowly slower. Most of this goes away if a dedicated machine tracks performance numbers continously.