On Thu, Oct 31, 2024 at 09:24:29AM -0700, Sean Christopherson wrote: > On Tue, Oct 22, 2024, Bernhard Kauer wrote: > > > > It used a static key to avoid loading the lapic pointer from > > > > the vcpu->arch structure. However, in the common case the load > > > > is from a hot cacheline and the CPU should be able to perfectly > > > > predict it. Thus there is no upside of this premature optimization. > > > > > > Do you happen to have performance numbers? > > > > Sure. I have some preliminary numbers as I'm still optimizing the > > round-trip time for tiny virtual machines. > > > > A hello-world micro benchmark on my AMD 6850U needs at least 331us. With > > the static keys it requires 579us. That is a 75% increase. > > For the first VM only though, correct? That is right. If I keep one VM in the background the overhead is not measureable anymore. > > Take the absolute values with a grain of salt as not all of my patches might > > be applicable to the general case. > > > > For the other side I don't have a relevant benchmark yet. But I doubt you > > would see anything even with a very high IRQ rate. > > > > > > > > The downside is that code patching including an IPI to all CPUs > > > > is required whenever the first VM without an lapic is created or > > > > the last is destroyed. > > > > > > In practice, this almost never happens though. Do you have a use case for > > > creating VMs without in-kernel local APICs? > > > > I switched from "full irqchip" to "no irqchip" due to a significant > > performance gain > > Signifcant performance gain for what path? I'm genuinely curious. I have this really slow PREEMPT_RT kernel (Debian 6.11.4-rt-amd64). The hello-world benchmark takes on average 100ms. With IRQCHIP it goes up to 220ms. An strace gives 83ms for the extra ioctl: ioctl(4, KVM_CREATE_IRQCHIP, 0) = 0 <0.083242> My current theory is that RCU takes ages on this kernel. And creating an IOAPIC uses SRCU to synchronize the bus array... However, in my latest benchmark runs the overhead for IRQCHIP is down to 15 microseconds. So no big deal anymore. > Unless your VM doesn't need a timer and doesn't need interrupts of > any kind, emulating the local APIC in userspace is going to be much > less performant. Do you have any performance numbers?