On Tue, Nov 05, 2024 at 05:51:35PM -0800, Sean Christopherson wrote: >Always set irr_pending (to true) when updating APICv status to fix a bug >where KVM fails to set irr_pending when userspace sets APIC state and >APICv is disabled, which ultimate results in KVM failing to inject the >pending interrupt(s) that userspace stuffed into the vIRR, until another >interrupt happens to be emulated by KVM. > >Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to >be true if APICv is enabled, because not all vIRR updates will be visible >to KVM. > >Hit the bug with a big hammer, even though strictly speaking KVM can scan >the vIRR and set/clear irr_pending as appropriate for this specific case. >The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't >touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as >the shortlog suggests, deleted code that updated irr_pending. > >Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with >with the crucial difference that kvm_apic_update_apicv() did the scan even >when APICv was being *disabled*, e.g. due to an AVIC inhibition. > > struct kvm_lapic *apic = vcpu->arch.apic; > > if (vcpu->arch.apicv_active) { > /* irr_pending is always true when apicv is activated. */ > apic->irr_pending = true; > apic->isr_count = 1; > } else { > apic->irr_pending = (apic_search_irr(apic) != -1); > apic->isr_count = count_vectors(apic->regs + APIC_ISR); > } > >And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78 >("kvm: lapic: Introduce APICv update helper function"), prior to which KVM >unconditionally set irr_pending to true in kvm_apic_set_state(), i.e. >assumed that the new virtual APIC state could have a pending IRQ. > >Furthermore, in addition to introducing this issue, commit 755c2bf87860 >also papered over the underlying bug: KVM doesn't ensure CPUs and devices >see APICv as disabled prior to searching the IRR. Waiting until KVM >emulates an EOI to update irr_pending "works", but only because KVM won't >emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of >memory barriers in between. I.e. leaving irr_pending set is basically >hacking around bad ordering. > >So, effectively revert to the pre-b26a695a1d78 behavior for state restore, >even though it's sub-optimal if no IRQs are pending, in order to provide a >minimal fix, but leave behind a FIXME to document the ugliness. With luck, >the ordering issue will be fixed and the mess will be cleaned up in the >not-too-distant future. > >Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it") >Cc: stable@xxxxxxxxxxxxxxx >Cc: Maxim Levitsky <mlevitsk@xxxxxxxxxx> >Reported-by: Yong He <zhuangel570@xxxxxxxxx> >Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com >Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> >--- > >v2: Go with a big hammer fix, and plan on scanning the vIRR for all cases > once the ordering bug has been resolved, i.e. once KVM guarantees the > scan happens after CPUs and devices see the new APICv state. > >v1: https://lore.kernel.org/all/20241101193532.1817004-1-seanjc@xxxxxxxxxx > > arch/x86/kvm/lapic.c | 29 ++++++++++++++++++----------- > 1 file changed, 18 insertions(+), 11 deletions(-) > >diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >index 65412640cfc7..e470061b744a 100644 >--- a/arch/x86/kvm/lapic.c >+++ b/arch/x86/kvm/lapic.c >@@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) > { > struct kvm_lapic *apic = vcpu->arch.apic; > >- if (apic->apicv_active) { >- /* irr_pending is always true when apicv is activated. */ >- apic->irr_pending = true; >+ /* >+ * When APICv is enabled, KVM must always search the IRR for a pending >+ * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU >+ * isn't running. If APICv is disabled, KVM _should_ search the IRR >+ * for a pending IRQ. But KVM currently doesn't ensure *all* hardware, >+ * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching >+ * the IRR at this time could race with IRQ delivery from hardware that >+ * still sees APICv as being enabled. >+ * >+ * FIXME: Ensure other vCPUs and devices observe the change in APICv >+ * state prior to updating KVM's metadata caches, so that KVM >+ * can safely search the IRR and set irr_pending accordingly. >+ */ >+ apic->irr_pending = true; Should irr_pending be cleared after the first search of IRR that finds no pending IRQ, i.e., in apic_find_highest_irr() when !apic->apicv_active? Otherwise, irr_pending will be out of sync until the arrival of an interrupt. Not sure if we want to avoid the unnecessary performance overhead of repeatedly searching IRR.