Re: [PATCH v2] KVM: x86: Unconditionally set irr_pending when updating APICv state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 05, 2024 at 05:51:35PM -0800, Sean Christopherson wrote:
>Always set irr_pending (to true) when updating APICv status to fix a bug
>where KVM fails to set irr_pending when userspace sets APIC state and
>APICv is disabled, which ultimate results in KVM failing to inject the
>pending interrupt(s) that userspace stuffed into the vIRR, until another
>interrupt happens to be emulated by KVM.
>
>Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
>be true if APICv is enabled, because not all vIRR updates will be visible
>to KVM.
>
>Hit the bug with a big hammer, even though strictly speaking KVM can scan
>the vIRR and set/clear irr_pending as appropriate for this specific case.
>The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't
>touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as
>the shortlog suggests, deleted code that updated irr_pending.
>
>Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with
>with the crucial difference that kvm_apic_update_apicv() did the scan even
>when APICv was being *disabled*, e.g. due to an AVIC inhibition.
>
>        struct kvm_lapic *apic = vcpu->arch.apic;
>
>        if (vcpu->arch.apicv_active) {
>                /* irr_pending is always true when apicv is activated. */
>                apic->irr_pending = true;
>                apic->isr_count = 1;
>        } else {
>                apic->irr_pending = (apic_search_irr(apic) != -1);
>                apic->isr_count = count_vectors(apic->regs + APIC_ISR);
>        }
>
>And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78
>("kvm: lapic: Introduce APICv update helper function"), prior to which KVM
>unconditionally set irr_pending to true in kvm_apic_set_state(), i.e.
>assumed that the new virtual APIC state could have a pending IRQ.
>
>Furthermore, in addition to introducing this issue, commit 755c2bf87860
>also papered over the underlying bug: KVM doesn't ensure CPUs and devices
>see APICv as disabled prior to searching the IRR.  Waiting until KVM
>emulates an EOI to update irr_pending "works", but only because KVM won't
>emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of
>memory barriers in between.  I.e. leaving irr_pending set is basically
>hacking around bad ordering.
>
>So, effectively revert to the pre-b26a695a1d78 behavior for state restore,
>even though it's sub-optimal if no IRQs are pending, in order to provide a
>minimal fix, but leave behind a FIXME to document the ugliness.  With luck,
>the ordering issue will be fixed and the mess will be cleaned up in the
>not-too-distant future.
>
>Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
>Cc: stable@xxxxxxxxxxxxxxx
>Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx>
>Reported-by: Yong He <zhuangel570@xxxxxxxxx>
>Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
>Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
>---
>
>v2: Go with a big hammer fix, and plan on scanning the vIRR for all cases
>    once the ordering bug has been resolved, i.e. once KVM guarantees the
>    scan happens after CPUs and devices see the new APICv state.
>
>v1: https://lore.kernel.org/all/20241101193532.1817004-1-seanjc@xxxxxxxxxx
>
> arch/x86/kvm/lapic.c | 29 ++++++++++++++++++-----------
> 1 file changed, 18 insertions(+), 11 deletions(-)
>
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index 65412640cfc7..e470061b744a 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
> {
> 	struct kvm_lapic *apic = vcpu->arch.apic;
> 
>-	if (apic->apicv_active) {
>-		/* irr_pending is always true when apicv is activated. */
>-		apic->irr_pending = true;
>+	/*
>+	 * When APICv is enabled, KVM must always search the IRR for a pending
>+	 * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU
>+	 * isn't running.  If APICv is disabled, KVM _should_ search the IRR
>+	 * for a pending IRQ.  But KVM currently doesn't ensure *all* hardware,
>+	 * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching
>+	 * the IRR at this time could race with IRQ delivery from hardware that
>+	 * still sees APICv as being enabled.
>+	 *
>+	 * FIXME: Ensure other vCPUs and devices observe the change in APICv
>+	 *        state prior to updating KVM's metadata caches, so that KVM
>+	 *        can safely search the IRR and set irr_pending accordingly.
>+	 */
>+	apic->irr_pending = true;

Should irr_pending be cleared after the first search of IRR that finds no
pending IRQ, i.e., in apic_find_highest_irr() when !apic->apicv_active?

Otherwise, irr_pending will be out of sync until the arrival of an interrupt.
Not sure if we want to avoid the unnecessary performance overhead of repeatedly
searching IRR.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux