On Mon, Dec 13, 2021, Maxim Levitsky wrote: > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > inhibited, it might read a stale value of vcpu->arch.apicv_active > which can lead to the target vCPU not noticing the interrupt. > > To fix this use load-acquire/store-release so that, if the target vCPU > is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the > AVIC. If AVIC has been disabled in the meanwhile, proceed with the > KVM_REQ_EVENT-based delivery. > > All this complicated logic is actually exactly how we can handle an > incomplete IPI vmexit; the only difference lies in who sets IRR, whether > KVM or the processor. > > Also incomplete IPI vmexit, has the same races as svm_deliver_avic_intr. > therefore just reuse the avic_kick_target_vcpu for it as well. > > Reported-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> Heh, probably don't need a Reported-by for a patch you wrote :-) > Co-developed-with: Paolo Bonzini <pbonzini@xxxxxxxxxx> Co-developed-by: is preferred, and should be accompanied by Paolo's SoB. > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > --- > arch/x86/kvm/svm/avic.c | 85 +++++++++++++++++++++++++---------------- > arch/x86/kvm/x86.c | 4 +- > 2 files changed, 55 insertions(+), 34 deletions(-) > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > index 90364d02f22aa..34f62da2fbadd 100644 > --- a/arch/x86/kvm/svm/avic.c > +++ b/arch/x86/kvm/svm/avic.c > @@ -289,6 +289,47 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu) > return 0; > } > > +static void avic_kick_target_vcpu(struct kvm_vcpu *vcpu) > +{ > + bool in_guest_mode; > + > + /* > + * vcpu->arch.apicv_active is read after vcpu->mode. Pairs This should say "must be read", not "is read". It's obvious from the code that apicv_active is read second, the comment is there to say that it _must_ be read after vcpu->mode. > + * with smp_store_release in vcpu_enter_guest. > + */ > + in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE); IMO, it's marginally clear to initialize the bool. bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE); > + if (READ_ONCE(vcpu->arch.apicv_active)) { > + if (in_guest_mode) { > + /* > + * Signal the doorbell to tell hardware to inject the IRQ if the vCPU > + * is in the guest. If the vCPU is not in the guest, hardware will > + * automatically process AVIC interrupts at VMRUN. Might as well wrap these comments at 80 chars since they're being moved. Or maybe even better.... /* blah blah blah */ if (!READ_ONCE(vcpu->arch.apicv_active)) { kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); return; } if (in_guest_mode) { ... } else { .... } ...so that the existing comments can be preserved as is. > + * > + * Note, the vCPU could get migrated to a different pCPU at any > + * point, which could result in signalling the wrong/previous > + * pCPU. But if that happens the vCPU is guaranteed to do a > + * VMRUN (after being migrated) and thus will process pending > + * interrupts, i.e. a doorbell is not needed (and the spurious > + * one is harmless). > + */ > + int cpu = READ_ONCE(vcpu->cpu); > + if (cpu != get_cpu()) > + wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu)); > + put_cpu(); > + } else { > + /* > + * Wake the vCPU if it was blocking. KVM will then detect the > + * pending IRQ when checking if the vCPU has a wake event. > + */ > + kvm_vcpu_wake_up(vcpu); > + } > + } else { > + /* Compare this case with __apic_accept_irq. */ Honestly, this comment isn't very helpful. It only takes a few lines to say: /* * Manually signal the event, the __apic_accept_irq() fallback * path can't be used if AVIC is disabled after the vector is * already queued in the vIRR. */ (incorporating more feedback below) > + kvm_make_request(KVM_REQ_EVENT, vcpu); > + kvm_vcpu_kick(vcpu); > + } > +} > + > static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, > u32 icrl, u32 icrh) > { > @@ -304,8 +345,10 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, > kvm_for_each_vcpu(i, vcpu, kvm) { > if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK, > GET_APIC_DEST_FIELD(icrh), > - icrl & APIC_DEST_MASK)) > - kvm_vcpu_wake_up(vcpu); > + icrl & APIC_DEST_MASK)) { > + vcpu->arch.apic->irr_pending = true; > + avic_kick_target_vcpu(vcpu); > + } > } > } > > @@ -671,9 +714,12 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > > int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > { > - if (!vcpu->arch.apicv_active) > - return -1; > - > + /* > + * Below, we have to handle anyway the case of AVIC being disabled > + * in the middle of this function, and there is hardly any overhead > + * if AVIC is disabled. So, we do not bother returning -1 and handle > + * the kick ourselves for disabled APICv. Hmm, my preference would be to keep the "return -1" even though apicv_active must be rechecked. That would help highlight that returning "failure" after this point is not an option as it would result in kvm_lapic_set_irr() being called twice. > + */ > kvm_lapic_set_irr(vec, vcpu->arch.apic); > > /* > @@ -684,34 +730,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > * the doorbell if the vCPU is already running in the guest. > */ > smp_mb__after_atomic(); > - > - /* > - * Signal the doorbell to tell hardware to inject the IRQ if the vCPU > - * is in the guest. If the vCPU is not in the guest, hardware will > - * automatically process AVIC interrupts at VMRUN. > - */ > - if (vcpu->mode == IN_GUEST_MODE) { > - int cpu = READ_ONCE(vcpu->cpu); > - > - /* > - * Note, the vCPU could get migrated to a different pCPU at any > - * point, which could result in signalling the wrong/previous > - * pCPU. But if that happens the vCPU is guaranteed to do a > - * VMRUN (after being migrated) and thus will process pending > - * interrupts, i.e. a doorbell is not needed (and the spurious > - * one is harmless). > - */ > - if (cpu != get_cpu()) > - wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu)); > - put_cpu(); > - } else { > - /* > - * Wake the vCPU if it was blocking. KVM will then detect the > - * pending IRQ when checking if the vCPU has a wake event. > - */ > - kvm_vcpu_wake_up(vcpu); > - } > - > + avic_kick_target_vcpu(vcpu); > return 0; > } > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 85127b3e3690b..81a74d86ee5eb 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9869,7 +9869,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > * result in virtual interrupt delivery. > */ > local_irq_disable(); > - vcpu->mode = IN_GUEST_MODE; > + > + /* Store vcpu->apicv_active before vcpu->mode. */ > + smp_store_release(&vcpu->mode, IN_GUEST_MODE); > > srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > > -- > 2.26.3 >