Re: [PATCH 3/3] KVM: x86: Decouple APICv activation state from apicv_inhibit_reasons

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 11 Feb 2025 08:37:17 -0800

On Tue, Feb 11, 2025, Naveen N Rao wrote:
> On Wed, Feb 05, 2025 at 12:36:21PM +0100, Paolo Bonzini wrote:
> I haven't analyzed this yet, but moving apicv_irq_window into a separate 
> cacheline is improving the performance in my tests by ~7 to 8%:
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9e3465e70a0a..d8a40ac49226 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1355,6 +1355,9 @@ struct kvm_arch {
>         struct kvm_ioapic *vioapic;
>         struct kvm_pit *vpit;
>         atomic_t vapics_in_nmi_mode;
> +
> +       atomic_t apicv_irq_window;
> +
>         struct mutex apic_map_lock;
>         struct kvm_apic_map __rcu *apic_map;
>         atomic_t apic_map_dirty;
> @@ -1365,7 +1368,6 @@ struct kvm_arch {
>         /* Protects apicv_inhibit_reasons */
>         struct rw_semaphore apicv_update_lock;
>         unsigned long apicv_inhibit_reasons;
> -       atomic_t apicv_irq_window;
> 
>         gpa_t wall_clock;
> 
> 
> I chose that spot before apic_map_lock simply because there was a 4 byte 
> hole there. This happens to also help performance in the AVIC disabled 
> case by a few percentage points (rather, restores the performance in the 
> AVIC disabled case).
> 
> Before this change, I was trying to see if we could entirely elide the 
> rwsem read lock in the specific scenario we are seeing the bottleneck.  
> That is, instead of checking for any other inhibit being set, can we 
> specifically test for PIT_REINJ while setting the IRQWIN inhibit? Then, 
> update the inhibit change logic if PIT_REINJ is cleared to re-check the 
> irq window count.
> 
> There's probably a race here somewhere, but FWIW, along with the above 
> change to 'struct kvm_arch', this helps improve performance by a few 
> more percentage points helping close the gap to within 2% of the AVIC 
> disabled case.

I suspect the issue is that apicv_inhibit_reasons is in the same cache line.  That
field is read on at least every entry

		/*
		 * Assert that vCPU vs. VM APICv state is consistent.  An APICv
		 * update must kick and wait for all vCPUs before toggling the
		 * per-VM state, and responding vCPUs must wait for the update
		 * to complete before servicing KVM_REQ_APICV_UPDATE.
		 */
		WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) &&
			     (kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED));

and when opening an IRQ window in svm_set_vintr()

	WARN_ON(kvm_vcpu_apicv_activated(&svm->vcpu));

and when handling emulated APIC MMIO in kvm_mmu_faultin_pfn():

		/*
		 * If the APIC access page exists but is disabled, go directly
		 * to emulation without caching the MMIO access or creating a
		 * MMIO SPTE.  That way the cache doesn't need to be purged
		 * when the AVIC is re-enabled.
		 */
		if (!kvm_apicv_activated(vcpu->kvm))
			return RET_PF_EMULATE;

Hmm, now that I think about it, lack of emulated MMIO caching that might explain
the 2% gap.  Do you see the same gap if the guest is using x2APIC?