On 2017/6/6 18:57, Paolo Bonzini wrote: > The simplify part: do not touch pi_desc.nv, we can set it when the > VCPU is first created. Likewise, pi_desc.sn is only handled by > vmx_vcpu_pi_load, do not touch it in __pi_post_block. > > The fix part: do not check kvm_arch_has_assigned_device, instead > check the SN bit to figure out whether vmx_vcpu_pi_put ran before. > This matches what the previous patch did in pi_post_block. > > Cc: Longpeng (Mike) <longpeng2@xxxxxxxxxx> > Cc: Huangweidong <weidong.huang@xxxxxxxxxx> > Cc: Gonglei <arei.gonglei@xxxxxxxxxx> > Cc: wangxin <wangxinxin.wang@xxxxxxxxxx> > Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > arch/x86/kvm/vmx.c | 68 ++++++++++++++++++++++++++++-------------------------- > 1 file changed, 35 insertions(+), 33 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 0f4714fe4908..81047f373747 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) > struct pi_desc old, new; > unsigned int dest; > > - if (!kvm_arch_has_assigned_device(vcpu->kvm) || > - !irq_remapping_cap(IRQ_POSTING_CAP) || > - !kvm_vcpu_apicv_active(vcpu)) > + /* > + * In case of hot-plug or hot-unplug, we may have to undo > + * vmx_vcpu_pi_put even if there is no assigned device. And we > + * always keep PI.NDST up to date for simplicity: it makes the > + * code easier, and CPU migration is not a fast path. > + */ > + if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu) > + return; Hi Paolo, I'm confused with the following scenario: (suppose the VM has a assigned devices) step 1. the running vcpu is be preempted --> vmx_vcpu_pi_put [ SET pi.sn ] step 2. hot-unplug the assigned devices step 3. the vcpu is scheduled in --> vmx_vcpu_pi_load [ CLEAR pi.sn ] step 4. the running vcpu is be preempted again --> vmx_vcpu_pi_put [ direct return ] step 5. the vcpu is migrated to another pcpu step 6. the vcpu is scheduled in --> vmx_vcpu_pi_load [ above check fails and continue to execute the follow parts ] I think vmx_vcpu_pi_load should return direct in step6, because vmx_vcpu_pi_put in step4 did nothing. So maybe the above check has a potential problem. Please kindly figure out if I misunderstand anything important :) -- Regards, Longpeng(Mike) > + > + /* > + * First handle the simple case where no cmpxchg is necessary; just > + * allow posting non-urgent interrupts. > + * > + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change > + * PI.NDST: pi_post_block will do it for us and the wakeup_handler > + * expects the VCPU to be on the blocked_vcpu_list that matches > + * PI.NDST. > + */ > + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || > + vcpu->cpu == cpu) { > + pi_clear_sn(pi_desc); > return; > + } > > + /* The full case. */ > do { > old.control = new.control = pi_desc->control; > > - /* > - * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there > - * are two possible cases: > - * 1. After running 'pre_block', context switch > - * happened. For this case, 'sn' was set in > - * vmx_vcpu_put(), so we need to clear it here. > - * 2. After running 'pre_block', we were blocked, > - * and woken up by some other guy. For this case, > - * we don't need to do anything, 'pi_post_block' > - * will do everything for us. However, we cannot > - * check whether it is case #1 or case #2 here > - * (maybe, not needed), so we also clear sn here, > - * I think it is not a big deal. > - */ > - if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) { > - if (vcpu->cpu != cpu) { > - dest = cpu_physical_id(cpu); > - > - if (x2apic_enabled()) > - new.ndst = dest; > - else > - new.ndst = (dest << 8) & 0xFF00; > - } > + dest = cpu_physical_id(cpu); > > - /* set 'NV' to 'notification vector' */ > - new.nv = POSTED_INTR_VECTOR; > - } > + if (x2apic_enabled()) > + new.ndst = dest; > + else > + new.ndst = (dest << 8) & 0xFF00; > > - /* Allow posting non-urgent interrupts */ > new.sn = 0; > } while (cmpxchg(&pi_desc->control, old.control, > new.control) != old.control); > @@ -9259,6 +9257,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) > > vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED; > > + /* > + * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR > + * or POSTED_INTR_WAKEUP_VECTOR. > + */ > + vmx->pi_desc.nv = POSTED_INTR_VECTOR; > + vmx->pi_desc.sn = 1; > + > return &vmx->vcpu; > > free_vmcs: > @@ -11249,9 +11254,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) > else > new.ndst = (dest << 8) & 0xFF00; > > - /* Allow posting non-urgent interrupts */ > - new.sn = 0; > - > /* set 'NV' to 'notification vector' */ > new.nv = POSTED_INTR_VECTOR; > } while (cmpxchg(&pi_desc->control, old.control, -- Regards, Longpeng(Mike)