On Tue, 2022-01-04 at 22:52 +0000, Sean Christopherson wrote: > On Mon, Dec 13, 2021, Maxim Levitsky wrote: > > If svm_deliver_avic_intr is called just after the target vcpu's AVIC got > > inhibited, it might read a stale value of vcpu->arch.apicv_active > > which can lead to the target vCPU not noticing the interrupt. > > > > To fix this use load-acquire/store-release so that, if the target vCPU > > is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the > > AVIC. If AVIC has been disabled in the meanwhile, proceed with the > > KVM_REQ_EVENT-based delivery. > > > > All this complicated logic is actually exactly how we can handle an > > incomplete IPI vmexit; the only difference lies in who sets IRR, whether > > KVM or the processor. > > > > Also incomplete IPI vmexit, has the same races as svm_deliver_avic_intr. > > therefore just reuse the avic_kick_target_vcpu for it as well. > > > > Reported-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > Heh, probably don't need a Reported-by for a patch you wrote :-) Paolo gave me this version, I pretty much sent it as is. We had few iterations of this patch before though we agreed that the race is finally gone. > > > Co-developed-with: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > Co-developed-by: is preferred, and should be accompanied by Paolo's SoB. First time I use this format, so I didn't knew about this. > > > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > --- > > arch/x86/kvm/svm/avic.c | 85 +++++++++++++++++++++++++---------------- > > arch/x86/kvm/x86.c | 4 +- > > 2 files changed, 55 insertions(+), 34 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c > > index 90364d02f22aa..34f62da2fbadd 100644 > > --- a/arch/x86/kvm/svm/avic.c > > +++ b/arch/x86/kvm/svm/avic.c > > @@ -289,6 +289,47 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu) > > return 0; > > } > > > > +static void avic_kick_target_vcpu(struct kvm_vcpu *vcpu) > > +{ > > + bool in_guest_mode; > > + > > + /* > > + * vcpu->arch.apicv_active is read after vcpu->mode. Pairs > > This should say "must be read", not "is read". It's obvious from the code that > apicv_active is read second, the comment is there to say that it _must_ be read > after vcpu->mode. > > > + * with smp_store_release in vcpu_enter_guest. > > + */ > > + in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE); > > IMO, it's marginally clear to initialize the bool. > > bool in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE); > > > + if (READ_ONCE(vcpu->arch.apicv_active)) { > > + if (in_guest_mode) { > > + /* > > + * Signal the doorbell to tell hardware to inject the IRQ if the vCPU > > + * is in the guest. If the vCPU is not in the guest, hardware will > > + * automatically process AVIC interrupts at VMRUN. > > Might as well wrap these comments at 80 chars since they're being moved. Or > maybe even better.... > > /* blah blah blah */ > if (!READ_ONCE(vcpu->arch.apicv_active)) { > kvm_make_request(KVM_REQ_EVENT, vcpu); > kvm_vcpu_kick(vcpu); > return; > } > > if (in_guest_mode) { > ... > } else { > .... > } > > ...so that the existing comments can be preserved as is. > > > + * > > + * Note, the vCPU could get migrated to a different pCPU at any > > + * point, which could result in signalling the wrong/previous > > + * pCPU. But if that happens the vCPU is guaranteed to do a > > + * VMRUN (after being migrated) and thus will process pending > > + * interrupts, i.e. a doorbell is not needed (and the spurious > > + * one is harmless). > > + */ > > + int cpu = READ_ONCE(vcpu->cpu); > > + if (cpu != get_cpu()) > > + wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu)); > > + put_cpu(); > > + } else { > > + /* > > + * Wake the vCPU if it was blocking. KVM will then detect the > > + * pending IRQ when checking if the vCPU has a wake event. > > + */ > > + kvm_vcpu_wake_up(vcpu); > > + } > > + } else { > > + /* Compare this case with __apic_accept_irq. */ > > Honestly, this comment isn't very helpful. It only takes a few lines to say: > > /* > * Manually signal the event, the __apic_accept_irq() fallback > * path can't be used if AVIC is disabled after the vector is > * already queued in the vIRR. > */ > > (incorporating more feedback below) > > > + kvm_make_request(KVM_REQ_EVENT, vcpu); > > + kvm_vcpu_kick(vcpu); > > + } > > +} > > + > > static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, > > u32 icrl, u32 icrh) > > { > > @@ -304,8 +345,10 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source, > > kvm_for_each_vcpu(i, vcpu, kvm) { > > if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK, > > GET_APIC_DEST_FIELD(icrh), > > - icrl & APIC_DEST_MASK)) > > - kvm_vcpu_wake_up(vcpu); > > + icrl & APIC_DEST_MASK)) { > > + vcpu->arch.apic->irr_pending = true; > > + avic_kick_target_vcpu(vcpu); > > + } > > } > > } > > > > @@ -671,9 +714,12 @@ void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap) > > > > int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > { > > - if (!vcpu->arch.apicv_active) > > - return -1; > > - > > + /* > > + * Below, we have to handle anyway the case of AVIC being disabled > > + * in the middle of this function, and there is hardly any overhead > > + * if AVIC is disabled. So, we do not bother returning -1 and handle > > + * the kick ourselves for disabled APICv. > > Hmm, my preference would be to keep the "return -1" even though apicv_active must > be rechecked. That would help highlight that returning "failure" after this point > is not an option as it would result in kvm_lapic_set_irr() being called twice. I don't mind either - this will fix the tracepoint I recently added to report the number of interrupts that were delivered by AVIC/APICv - with this patch, all of them count as such. I will also address all other feedback about the comments and send new version soon. Thanks for the review! Best regards, Maxim Levitsky > > > + */ > > kvm_lapic_set_irr(vec, vcpu->arch.apic); > > > > /* > > @@ -684,34 +730,7 @@ int svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) > > * the doorbell if the vCPU is already running in the guest. > > */ > > smp_mb__after_atomic(); > > - > > - /* > > - * Signal the doorbell to tell hardware to inject the IRQ if the vCPU > > - * is in the guest. If the vCPU is not in the guest, hardware will > > - * automatically process AVIC interrupts at VMRUN. > > - */ > > - if (vcpu->mode == IN_GUEST_MODE) { > > - int cpu = READ_ONCE(vcpu->cpu); > > - > > - /* > > - * Note, the vCPU could get migrated to a different pCPU at any > > - * point, which could result in signalling the wrong/previous > > - * pCPU. But if that happens the vCPU is guaranteed to do a > > - * VMRUN (after being migrated) and thus will process pending > > - * interrupts, i.e. a doorbell is not needed (and the spurious > > - * one is harmless). > > - */ > > - if (cpu != get_cpu()) > > - wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(cpu)); > > - put_cpu(); > > - } else { > > - /* > > - * Wake the vCPU if it was blocking. KVM will then detect the > > - * pending IRQ when checking if the vCPU has a wake event. > > - */ > > - kvm_vcpu_wake_up(vcpu); > > - } > > - > > + avic_kick_target_vcpu(vcpu); > > return 0; > > } > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 85127b3e3690b..81a74d86ee5eb 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9869,7 +9869,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) > > * result in virtual interrupt delivery. > > */ > > local_irq_disable(); > > - vcpu->mode = IN_GUEST_MODE; > > + > > + /* Store vcpu->apicv_active before vcpu->mode. */ > > + smp_store_release(&vcpu->mode, IN_GUEST_MODE); > > > > srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); > > > > -- > > 2.26.3 > >