On Mon, Dec 09, 2024 at 05:40:48PM -0800, Sean Christopherson wrote: > On Mon, Dec 09, 2024, Sean Christopherson wrote: > > On Mon, Oct 21, 2024, Bernhard Kauer wrote: > > > It used a static key to avoid loading the lapic pointer from > > > the vcpu->arch structure. However, in the common case the load > > > is from a hot cacheline and the CPU should be able to perfectly > > > predict it. Thus there is no upside of this premature optimization. > > > > > > The downside is that code patching including an IPI to all CPUs > > > is required whenever the first VM without an lapic is created or > > > the last is destroyed. > > > > > I'm on the fence, slightly leaning towards removing all three of these static keys. Thanks for continuing this work. > > With a single vCPU pinned to a single pCPU, the average latency for a CPUID exit > > goes from 1018 => 1027 cycles, plus or minus a few. With 8 vCPUs, no pinning > > (mostly laziness), the average latency goes from 1034 => 1053. Are these kind of benchmarks tracked somewhere automatically? With it one could systematically optimize for faster exits. > > On the other hand, we lose gobs and gobs of cycles with far less thought. E.g. > > with mitigations on, the latency for a single vCPU jumps all the way to 1600+ cycles. In the end it is a tradeoff to be made. The cost for switching between the modes is more than a hundred microsecond unexpected latency. On the other hande one saves 1-2% per exit but has a larger code-base.