Re: [PATCH 3/3] KVM: x86: Decouple APICv activation state from apicv_inhibit_reasons

Naveen N Rao <naveen@xxxxxxxxxx> · Tue, 11 Feb 2025 23:43:41 +0530

On Tue, Feb 11, 2025 at 08:37:17AM -0800, Sean Christopherson wrote:
> On Tue, Feb 11, 2025, Naveen N Rao wrote:
> > On Wed, Feb 05, 2025 at 12:36:21PM +0100, Paolo Bonzini wrote:
> > I haven't analyzed this yet, but moving apicv_irq_window into a separate 
> > cacheline is improving the performance in my tests by ~7 to 8%:
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 9e3465e70a0a..d8a40ac49226 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1355,6 +1355,9 @@ struct kvm_arch {
> >         struct kvm_ioapic *vioapic;
> >         struct kvm_pit *vpit;
> >         atomic_t vapics_in_nmi_mode;
> > +
> > +       atomic_t apicv_irq_window;
> > +
> >         struct mutex apic_map_lock;
> >         struct kvm_apic_map __rcu *apic_map;
> >         atomic_t apic_map_dirty;
> > @@ -1365,7 +1368,6 @@ struct kvm_arch {
> >         /* Protects apicv_inhibit_reasons */
> >         struct rw_semaphore apicv_update_lock;
> >         unsigned long apicv_inhibit_reasons;
> > -       atomic_t apicv_irq_window;
> > 
> >         gpa_t wall_clock;
> > 
> > 
> > I chose that spot before apic_map_lock simply because there was a 4 byte 
> > hole there. This happens to also help performance in the AVIC disabled 
> > case by a few percentage points (rather, restores the performance in the 
> > AVIC disabled case).
> > 
> > Before this change, I was trying to see if we could entirely elide the 
> > rwsem read lock in the specific scenario we are seeing the bottleneck.  
> > That is, instead of checking for any other inhibit being set, can we 
> > specifically test for PIT_REINJ while setting the IRQWIN inhibit? Then, 
> > update the inhibit change logic if PIT_REINJ is cleared to re-check the 
> > irq window count.
> > 
> > There's probably a race here somewhere, but FWIW, along with the above 
> > change to 'struct kvm_arch', this helps improve performance by a few 
> > more percentage points helping close the gap to within 2% of the AVIC 
> > disabled case.
> 
> I suspect the issue is that apicv_inhibit_reasons is in the same cache line.  That
> field is read on at least every entry
> 
> 		/*
> 		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
> 		 * update must kick and wait for all vCPUs before toggling the
> 		 * per-VM state, and responding vCPUs must wait for the update
> 		 * to complete before servicing KVM_REQ_APICV_UPDATE.
> 		 */
> 		WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) &&
> 			     (kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED));
> 
> and when opening an IRQ window in svm_set_vintr()
> 
> 	WARN_ON(kvm_vcpu_apicv_activated(&svm->vcpu));

Possibly, but we also write to apicv_update_lock every time we update 
apicv_irq_window in kvm_inc_or_dec_irq_window_inhibit(), and that is in 
the same cacheline as apicv_inhibit_reasons:

	if (READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~BIT(APICV_INHIBIT_REASON_IRQWIN)) {
		guard(rwsem_read)(&kvm->arch.apicv_update_lock);
		if (READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~BIT(APICV_INHIBIT_REASON_IRQWIN)) {
			atomic_add(add, &kvm->arch.apicv_irq_window);
			return;
		}
	}

Also, note that introducing apicv_irq_window after apicv_inhibit_reasons 
is degrading performance in the AVIC disabled case too. So, it is likely 
that some other cacheline below apicv_inhibit_reasons in kvm_arch may 
also be contributing to this.

> 
> and when handling emulated APIC MMIO in kvm_mmu_faultin_pfn():
> 
> 		/*
> 		 * If the APIC access page exists but is disabled, go directly
> 		 * to emulation without caching the MMIO access or creating a
> 		 * MMIO SPTE.  That way the cache doesn't need to be purged
> 		 * when the AVIC is re-enabled.
> 		 */
> 		if (!kvm_apicv_activated(vcpu->kvm))
> 			return RET_PF_EMULATE;
> 
> Hmm, now that I think about it, lack of emulated MMIO caching that might explain
> the 2% gap.  Do you see the same gap if the guest is using x2APIC?

Are you referring to hybrid-AVIC? I am testing this on a Turin system 
that has x2AVIC support, so the guest is running in x2APIC/x2AVIC mode 
(kvm_amd avic=0/avic=1)

- Naveen