+Maxim On Fri, Nov 01, 2024, Sean Christopherson wrote: > Explicitly set apic->irr_pending after stuffing the vIRR when userspace > sets APIC state and APICv is disabled, otherwise KVM will skip scanning > the vIRR in subsequent calls to apic_find_highest_irr(), and ultimately > fail to inject the interrupt until another interrupt happens to be added > to the vIRR. > > Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to > be true if APICv is enabled, because not all vIRR updates will be visible > to KVM. > > Note, irr_pending is intentionally not updated in kvm_apic_update_apicv(), > because when APICv is being inhibited/disabled, KVM needs to keep the flag > set until the next emulated EOI so that KVM will correctly handle any > in-flight updates to the vIRR from hardware. But when setting APIC state, > neither the VM nor the VMM can assume specific ordering between an update > from hardware and overwriting all state in kvm_apic_set_state(), thus KVM > can safely clear irr_pending if the vIRR is empty. > > Reported-by: Yong He <zhuangel570@xxxxxxxxx> > Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/lapic.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 65412640cfc7..deb73aea2c06 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) > kvm_x86_call(hwapic_irr_update)(vcpu, > apic_find_highest_irr(apic)); > kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic)); > + } else { > + /* > + * Note, kvm_apic_update_apicv() is responsible for updating > + * isr_count and highest_isr_cache. irr_pending is somewhat > + * special because it mustn't be cleared when APICv is disabled > + * at runtime, and only state restore can cause an IRR bit to > + * be set without also refreshing irr_pending. > + */ > + apic->irr_pending = apic_search_irr(apic) != -1; I did a bit more archaeology in order to give this a Fixes tag (and a Cc: stable), and found two interesting evolutions of this code. The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as the shortlog suggests, deleted code that update irr_pending. Before that commit, kvm_apic_update_apicv() did more or less what I am proposing here, with the obvious difference that the proposed fix is specific to kvm_lapic_reset(). struct kvm_lapic *apic = vcpu->arch.apic; if (vcpu->arch.apicv_active) { /* irr_pending is always true when apicv is activated. */ apic->irr_pending = true; apic->isr_count = 1; } else { apic->irr_pending = (apic_search_irr(apic) != -1); apic->isr_count = count_vectors(apic->regs + APIC_ISR); } And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function"). Prior to 97a71c444a14, KVM unconditionally set irr_pending to true in kvm_apic_set_state(), i.e. assumed that the new virtual APIC state could have a pending IRQ (which isn't a terrible assumption. Furthermore, in addition to introducing this issue, commit 755c2bf87860 also papered over the underlying bug: KVM doesn't ensure CPUs and devices see APICv as disabled prior to searching the IRR. Waiting until KVM emulates EOI to update irr_pending works because KVM won't emulate EOI until after refresh_apicv_exec_ctrl(), and because there are plenty of memory barries in between, but leaving irr_pending set is basically hacking around bad ordering, which I _think_ can be fixed by: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146f..85d330b56c7e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10548,8 +10548,8 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) goto out; apic->apicv_active = activate; - kvm_apic_update_apicv(vcpu); kvm_x86_call(refresh_apicv_exec_ctrl)(vcpu); + kvm_apic_update_apicv(vcpu); /* * When APICv gets disabled, we may still have injected interrupts So, while searching the IRR is technically sufficient to fix the bug, I'm leaning *very* strongly torward fixing this bug by unconditionally setting irr_pending to true in kvm_apic_update_apicv(), with a FIXME to call out what KVM should be doing. And then address that FIXME in a future series (I have a rather massive pile of fixes and cleanups that are closely related, so there will be ample opportunity). From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Fri, 1 Nov 2024 12:35:32 -0700 Subject: [PATCH] KVM: x86: Unconditionally set irr_pending when updating APICv state TODO: writeme Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it") Cc: stable@xxxxxxxxxxxxxxx Reported-by: Yong He <zhuangel570@xxxxxxxxx> Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/lapic.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 2098dc689088..95c6beb8ce27 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; - if (apic->apicv_active) { - /* irr_pending is always true when apicv is activated. */ - apic->irr_pending = true; + /* + * When APICv is enabled, KVM must always search the IRR for a pending + * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU + * isn't running. If APICv is disabled, KVM _should_ search the IRR + * for a pending IRQ. But KVM currently doesn't ensure *all* hardware, + * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching + * the IRR at this time could race with IRQ delivery from hardware that + * still sees APICv as being enabled. + * + * FIXME: Ensure other vCPUs and devices observe the change in APICv + * state prior to updating KVM's metadata caches, so that KVM + * can safely search the IRR and set irr_pending accordingly. + */ + apic->irr_pending = true; + + if (apic->apicv_active) apic->isr_count = 1; - } else { - /* - * Don't clear irr_pending, searching the IRR can race with - * updates from the CPU as APICv is still active from hardware's - * perspective. The flag will be cleared as appropriate when - * KVM injects the interrupt. - */ + else apic->isr_count = count_vectors(apic->regs + APIC_ISR); - } + apic->highest_isr_cache = -1; } base-commit: 8fe4fefefa1b9ea01557d454699c20fdf709e890 --