On Wed, Nov 06, 2024 at 05:54:19AM -0800, Sean Christopherson wrote: >On Wed, Nov 06, 2024, Chao Gao wrote: >> >Furthermore, in addition to introducing this issue, commit 755c2bf87860 also >> >papered over the underlying bug: KVM doesn't ensure CPUs and devices see APICv >> >as disabled prior to searching the IRR. Waiting until KVM emulates EOI to update >> >irr_pending works because KVM won't emulate EOI until after refresh_apicv_exec_ctrl(), >> >and because there are plenty of memory barries in between, but leaving irr_pending >> >set is basically hacking around bad ordering, which I _think_ can be fixed by: >> > >> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> >index 83fe0a78146f..85d330b56c7e 100644 >> >--- a/arch/x86/kvm/x86.c >> >+++ b/arch/x86/kvm/x86.c >> >@@ -10548,8 +10548,8 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu) >> > goto out; >> > >> > apic->apicv_active = activate; >> >- kvm_apic_update_apicv(vcpu); >> > kvm_x86_call(refresh_apicv_exec_ctrl)(vcpu); >> >+ kvm_apic_update_apicv(vcpu); >> >> I may miss something important. how does this change ensure CPUs and devices see >> APICv as disabled (thus won't manipulate the vCPU's IRR)? Other CPUs when >> performing IPI virtualization just looks up the PID_table while IOMMU looks up >> the IRTE table. ->refresh_apicv_exec_ctrl() doesn't change any of them. > >For Intel, which is a bug (one of many in this area). AMD does update both. The >failure Maxim was addressing was on AMD (AVIC), which has many more scenarios where >it needs to be inhibited/disabled. Yes indeed. Actually the commit below fixes the bug for Intel already. Just the approach isn't to let other CPUs and devices see APICv disabled. Instead, pick up all pending IRQs (in PIR) before VM-entry and cancel VM-entry if needed. 1 commit 7e1901f6c86c896acff6609e0176f93f756d8b2a 2 Author: Paolo Bonzini <pbonzini@xxxxxxxxxx> 3 Date: Mon Nov 22 19:43:09 2021 -0500 4 5 KVM: VMX: prepare sync_pir_to_irr for running with APICv disabled 6 7 If APICv is disabled for this vCPU, assigned devices may still attempt to 8 post interrupts. In that case, we need to cancel the vmentry and deliver 9 the interrupt with KVM_REQ_EVENT. Extend the existing code that handles 10 injection of L1 interrupts into L2 to cover this case as well. 11 12 vmx_hwapic_irr_update is only called when APICv is active so it would be 13 confusing to add a check for vcpu->arch.apicv_active in there. Instead, 14 just use vmx_set_rvi directly in vmx_sync_pir_to_irr.