Re: [PATCH] KVM: drop the kvm_has_noapic_vcpu optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 31, 2024 at 09:24:29AM -0700, Sean Christopherson wrote:
> On Tue, Oct 22, 2024, Bernhard Kauer wrote:
> > > > It used a static key to avoid loading the lapic pointer from
> > > > the vcpu->arch structure.  However, in the common case the load
> > > > is from a hot cacheline and the CPU should be able to perfectly
> > > > predict it. Thus there is no upside of this premature optimization.
> > >
> > > Do you happen to have performance numbers?
> >
> > Sure.  I have some preliminary numbers as I'm still optimizing the
> > round-trip time for tiny virtual machines.
> >
> > A hello-world micro benchmark on my AMD 6850U needs at least 331us.  With
> > the static keys it requires 579us.  That is a 75% increase.
>
> For the first VM only though, correct?

That is right. If I keep one VM in the background the overhead is not
measureable anymore.


> > Take the absolute values with a grain of salt as not all of my patches might
> > be applicable to the general case.
> >
> > For the other side I don't have a relevant benchmark yet.  But I doubt you
> > would see anything even with a very high IRQ rate.
> >
> >
> > > > The downside is that code patching including an IPI to all CPUs
> > > > is required whenever the first VM without an lapic is created or
> > > > the last is destroyed.
> > >
> > > In practice, this almost never happens though.  Do you have a use case for
> > > creating VMs without in-kernel local APICs?
> >
> > I switched from "full irqchip" to "no irqchip" due to a significant
> > performance gain
>
> Signifcant performance gain for what path?  I'm genuinely curious.

I have this really slow PREEMPT_RT kernel (Debian 6.11.4-rt-amd64).
The hello-world benchmark takes on average 100ms.  With IRQCHIP it goes
up to 220ms.  An strace gives 83ms for the extra ioctl:

        ioctl(4, KVM_CREATE_IRQCHIP, 0)         = 0 <0.083242>

My current theory is that RCU takes ages on this kernel.  And creating an
IOAPIC uses SRCU to synchronize the bus array...

However, in my latest benchmark runs the overhead for IRQCHIP is down to 15
microseconds.  So no big deal anymore.


> Unless your VM doesn't need a timer and doesn't need interrupts of
> any kind, emulating the local APIC in userspace is going to be much
> less performant.


Do you have any performance numbers?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux