Re: [PATCH v2] KVM: x86: Drop the kvm_has_noapic_vcpu optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+Paolo, I'm pretty sure he still doesn't subscribe to kvm@ :-)

On Mon, Dec 09, 2024, Sean Christopherson wrote:
> On Mon, Oct 21, 2024, Bernhard Kauer wrote:
> > It used a static key to avoid loading the lapic pointer from
> > the vcpu->arch structure.  However, in the common case the load
> > is from a hot cacheline and the CPU should be able to perfectly
> > predict it. Thus there is no upside of this premature optimization.
> > 
> > The downside is that code patching including an IPI to all CPUs
> > is required whenever the first VM without an lapic is created or
> > the last is destroyed.
> > 
> > Signed-off-by: Bernhard Kauer <bk@xxxxxxxxx>
> > ---
> > 
> > V1->V2: remove spillover from other patch and fix style
> > 
> >  arch/x86/kvm/lapic.c | 10 ++--------
> >  arch/x86/kvm/lapic.h |  6 +-----
> >  arch/x86/kvm/x86.c   |  6 ------
> >  3 files changed, 3 insertions(+), 19 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 2098dc689088..287a43fae041 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -135,8 +135,6 @@ static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> >  	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> >  }
> >  
> > -__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
> > -EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
> >  
> >  __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_hw_disabled, HZ);
> >  __read_mostly DEFINE_STATIC_KEY_DEFERRED_FALSE(apic_sw_disabled, HZ);
> 
> I'm on the fence, slightly leaning towards removing all three of these static keys.
> 
> If we remove kvm_has_noapic_vcpu to avoid the text patching, then we should
> definitely drop apic_sw_disabled, as vCPUs are practically guaranteed to toggle
> the S/W enable bit, e.g. it starts out '0' at RESET.  And if we drop apic_sw_disabled,
> then keeping apic_hw_disabled seems rather pointless.
> 
> Removing all three keys is measurable, but the impact is so tiny that I have a
> hard time believing anyone would notice in practice.
> 
> To measure, I tweaked KVM to handle CPUID exits in the fastpath and then ran the
> KVM-Unit-Test CPUID microbenchmark (with some minor modifications).  Handling
> CPUID in the fastpath makes the kvm_lapic_enabled() call in the innermost run loop
> stick out (that helpers checks all three keys/conditions).
> 
> 	for (;;) {
> 		/*
> 		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
> 		 * update must kick and wait for all vCPUs before toggling the
> 		 * per-VM state, and responding vCPUs must wait for the update
> 		 * to complete before servicing KVM_REQ_APICV_UPDATE.
> 		 */
> 		WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) &&
> 			     (kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED));
> 
> 		exit_fastpath = kvm_x86_call(vcpu_run)(vcpu,
> 						       req_immediate_exit);
> 		if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
> 			break;
> 
> 		if (kvm_lapic_enabled(vcpu))
> 			kvm_x86_call(sync_pir_to_irr)(vcpu);
> 
> 		if (unlikely(kvm_vcpu_exit_request(vcpu))) {
> 			exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
> 			break;
> 		}
> 
> 		/* Note, VM-Exits that go down the "slow" path are accounted below. */
> 		++vcpu->stat.exits;
> 	}
> 
> With a single vCPU pinned to a single pCPU, the average latency for a CPUID exit
> goes from 1018 => 1027 cycles, plus or minus a few.  With 8 vCPUs, no pinning
> (mostly laziness), the average latency goes from 1034 => 1053.
> 
> Other flows that check multiple vCPUs, e.g. kvm_irq_delivery_to_apic(), might be
> more affected?  The optimized APIC map should help for common cases, but KVM does
> still check if APICs are enabled multiple times when delivering interrupts.  And
> that's really my only hesitation: there are checks *everywhere* in KVM.
> 
> On the other hand, we lose gobs and gobs of cycles with far less thought.  E.g.
> with mitigations on, the latency for a single vCPU jumps all the way to 1600+ cycles.
> 
> And while the diff stats are quite nice, the relevant code is low maintenance.
> 
>  arch/x86/kvm/lapic.c | 41 ++---------------------------------------
>  arch/x86/kvm/lapic.h | 19 +++----------------
>  arch/x86/kvm/x86.c   |  4 +---
>  3 files changed, 6 insertions(+), 58 deletions(-)
> 
> Paolo or anyone else... thoughts?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux