Hi Oren, Sorry for delay response. On 3/5/19 1:15 AM, Oren Twaig wrote: > Hello Suravee, > > According to AMD's SDM, the target-not-running incomplete > ipi exit is only received if any of the destination cpus had the > not-running bit set in the avic backing page. I believe you are referring to the "isRunning" (IR) bit is in the AVIC physical APIC ID table entry. > However, not before the CPU _already_ set the relevant IRR bit > in all these cpus. Not sure what you meant here. > In this change, the patch forces KVM to send another interrupt > to the vcpu whether SVM already did that or not. Which means > the vcpu/s, under some conditions, can get an EXTRA interrupt > it never intended to get >> Example: > 1. vcpu B: Is in "not-running" state. > 2. vcpu A: Writes to the ICR to send vector 80 to vcpu B > 3. vcpu A: SVM updates vcpu B IRR with bit 80 > 4. vcpu A: SVM exits on incomplete IPI target-not-running exit. > 5. vcpu A: Now stops executing any code @ hypervisor level. > 6. vcpu B: Due to another interrupt (like lapic timer) > resumes running the guest. While handling interrupts, > it also handles interrupt vector 80 (as it's in his IRR) > 7. vcpu A: resumes executing the below code and sends > an _additional_interrupt to vcpu B. > > Overall, vcpu B got two interrupts. The second is unwanted and > not documented in the system architecture. > > Can you please elaborate more to why the implementation > below conflict with the specifications (which was the code > before this commit) ? This patch was introduced to fix an issue where the SVM driver tries to handle the step 5 above by scheduling vcpu B into _running_ state to handle the IPI from vcpu A. However, prior to this patch, vcpu B was never get scheduled to run unless there are other interrupts (e.g. timer). This should not be the case as Vcpu B should have been running regardless of other interrupts. So, I don't think step 6 and 7 above are correct. The issue was caused by the apic->irr_pending not set to true when trying to get vcpu B scheduled. This flag is checked in apic_find_highest_irr() before searching for the highest bit. To fix the issue, I decided to leverage the existing emulation code for ICR and ICR2, which in turn calls apic_send_ipi() to deliver interrupt to vpu B. However, looking a bit more closely, I notice the logic in svm_deliver_avic_intr() should also have been changed from kvm_vcpu_wake_up() to kvm_vcpu_kick() since the latter will result in clearing the IRR bit for the IPI vector when trying to send IPI as part of the following call path. vcpu_enter_guest() |-- inject_pending_event() |-- kvm_cpu_get_interrupt() |-- kvm_get_apic_interrupt() |-- apic_clear_irr() |-- apic_set_isr() |-- apic_update_ppr() .... Please see the patch below. Not sure if this would address the problem you are seeing. Thanks, Suravee diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 24dfa6a93711..d2841c3dbc04 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -5219,11 +5256,13 @@ static void svm_deliver_avic_intr(struct kvm_vcpu *vcpu, int vec) kvm_lapic_set_irr(vec, vcpu->arch.apic); smp_mb__after_atomic(); - if (avic_vcpu_is_running(vcpu)) + if (avic_vcpu_is_running(vcpu)) { wrmsrl(SVM_AVIC_DOORBELL, kvm_cpu_get_apicid(vcpu->cpu)); - else - kvm_vcpu_wake_up(vcpu); + } else { + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); + } } static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)