Re: [PATCH] svm: Fix AVIC incomplete IPI emulation

"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@xxxxxxx> · Mon, 11 Mar 2019 11:38:05 +0000

Hi Oren,

Sorry for delay response.

On 3/5/19 1:15 AM, Oren Twaig wrote:
> Hello Suravee,
> 
> According to AMD's SDM, the target-not-running incomplete
> ipi exit is only received if any of the destination cpus had the
> not-running bit set in the avic backing page.

I believe you are referring to the "isRunning" (IR) bit is in the
AVIC physical APIC ID table entry.

> However, not before the CPU _already_ set the relevant IRR bit
> in all these cpus.

Not sure what you meant here.

> In this change, the patch forces KVM to send another interrupt 
> to the vcpu whether SVM already did that or not. Which means
> the vcpu/s, under some conditions, can get an EXTRA interrupt
> it never intended to get >> Example:
>    1. vcpu B: Is in "not-running" state.
>    2. vcpu A: Writes to the ICR to send vector 80 to vcpu B
>    3. vcpu A: SVM updates vcpu B IRR with bit 80
>    4. vcpu A: SVM exits on incomplete IPI target-not-running exit.
>    5. vcpu A: Now stops executing any code @ hypervisor level.
>    6. vcpu B: Due to another interrupt (like lapic timer)
>       resumes running the guest. While handling interrupts,
>       it also handles interrupt vector 80 (as it's in his IRR)
>    7. vcpu A: resumes executing the below code and sends
>       an _additional_interrupt to vcpu B.
> 
> Overall, vcpu B got two interrupts. The second is unwanted and
> not documented in the system architecture.
>
> Can you please elaborate more to why the implementation
> below conflict with the specifications (which was the code
> before this commit) ?
This patch was introduced to fix an issue where the SVM driver tries to
handle the step 5 above by scheduling vcpu B into _running_ state to handle
the IPI from vcpu A. However, prior to this patch, vcpu B was never get
scheduled to run unless there are other interrupts (e.g. timer).
This should not be the case as Vcpu B should have been running regardless
of other interrupts. So, I don't think step 6 and 7 above are correct.

The issue was caused by the apic->irr_pending not set to true when trying to
get vcpu B scheduled. This flag is checked in apic_find_highest_irr() before
searching for the highest bit.

To fix the issue, I decided to leverage the existing emulation code for
ICR and ICR2, which in turn calls apic_send_ipi() to deliver interrupt to vpu B.

However, looking a bit more closely, I notice the logic in svm_deliver_avic_intr()
should also have been changed from kvm_vcpu_wake_up() to kvm_vcpu_kick()
since the latter will result in clearing the IRR bit for the IPI vector
when trying to send IPI as part of the following call path.

   vcpu_enter_guest()
     |-- inject_pending_event()
       |-- kvm_cpu_get_interrupt()
         |--  kvm_get_apic_interrupt()
           |-- apic_clear_irr()
           |-- apic_set_isr()
           |-- apic_update_ppr() ....

Please see the patch below.

Not sure if this would address the problem you are seeing.

Thanks,
Suravee

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 24dfa6a93711..d2841c3dbc04 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5219,11 +5256,13 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec)
         kvm_lapic_set_irr(vec, vcpu->arch.apic);
         smp_mb__after_atomic();

-       if (avic_vcpu_is_running(vcpu))
+       if (avic_vcpu_is_running(vcpu)) {
                 wrmsrl(SVM_AVIC_DOORBELL,
                        kvm_cpu_get_apicid(vcpu->cpu));
-       else
-               kvm_vcpu_wake_up(vcpu);
+       } else {
+               kvm_make_request(KVM_REQ_EVENT, vcpu);
+               kvm_vcpu_kick(vcpu);
+       }
  }

  static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)