On Tue, Oct 22, 2024, Bernhard Kauer wrote: > On Tue, Oct 22, 2024 at 10:32:59AM -0700, Sean Christopherson wrote: > > On Fri, Oct 18, 2024, Bernhard Kauer wrote: > > > It used a static key to avoid loading the lapic pointer from > > > the vcpu->arch structure. However, in the common case the load > > > is from a hot cacheline and the CPU should be able to perfectly > > > predict it. Thus there is no upside of this premature optimization. > > > > Do you happen to have performance numbers? > > Sure. I have some preliminary numbers as I'm still optimizing the > round-trip time for tiny virtual machines. > > A hello-world micro benchmark on my AMD 6850U needs at least 331us. With > the static keys it requires 579us. That is a 75% increase. For the first VM only though, correct? > Take the absolute values with a grain of salt as not all of my patches might > be applicable to the general case. > > For the other side I don't have a relevant benchmark yet. But I doubt you > would see anything even with a very high IRQ rate. > > > > > The downside is that code patching including an IPI to all CPUs > > > is required whenever the first VM without an lapic is created or > > > the last is destroyed. > > > > In practice, this almost never happens though. Do you have a use case for > > creating VMs without in-kernel local APICs? > > I switched from "full irqchip" to "no irqchip" due to a significant > performance gain Signifcant performance gain for what path? I'm genuinely curious. Unless your VM doesn't need a timer and doesn't need interrupts of any kind, emulating the local APIC in userspace is going to be much less performant. > and the simplicity it promised. Similar to above, unless you are not emulating a local APIC anywhere, disabling KVM's in-kernel local APIC isn't a meaningful change in overall complexity. > I might have to go to "split irqchip" mode for performance reasons but I > didn't had time to look into it yet. > > So in the end I assume it will be a trade-off: Do I want to rely on these > 3000 lines of kernel code to gain an X% performance increase, or not?