On 2017/7/28 12:22, Longpeng (Mike) wrote: > > > On 2017/6/6 18:57, Paolo Bonzini wrote: > >> The simplify part: do not touch pi_desc.nv, we can set it when the >> VCPU is first created. Likewise, pi_desc.sn is only handled by >> vmx_vcpu_pi_load, do not touch it in __pi_post_block. >> >> The fix part: do not check kvm_arch_has_assigned_device, instead >> check the SN bit to figure out whether vmx_vcpu_pi_put ran before. >> This matches what the previous patch did in pi_post_block. >> >> Cc: Longpeng (Mike) <longpeng2@xxxxxxxxxx> >> Cc: Huangweidong <weidong.huang@xxxxxxxxxx> >> Cc: Gonglei <arei.gonglei@xxxxxxxxxx> >> Cc: wangxin <wangxinxin.wang@xxxxxxxxxx> >> Cc: Radim Krčmář <rkrcmar@xxxxxxxxxx> >> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> --- >> arch/x86/kvm/vmx.c | 68 ++++++++++++++++++++++++++++-------------------------- >> 1 file changed, 35 insertions(+), 33 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 0f4714fe4908..81047f373747 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) >> struct pi_desc old, new; >> unsigned int dest; >> >> - if (!kvm_arch_has_assigned_device(vcpu->kvm) || >> - !irq_remapping_cap(IRQ_POSTING_CAP) || >> - !kvm_vcpu_apicv_active(vcpu)) >> + /* >> + * In case of hot-plug or hot-unplug, we may have to undo >> + * vmx_vcpu_pi_put even if there is no assigned device. And we >> + * always keep PI.NDST up to date for simplicity: it makes the >> + * code easier, and CPU migration is not a fast path. >> + */ >> + if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu) >> + return; > > > Hi Paolo, > > I'm confused with the following scenario: > > (suppose the VM has a assigned devices) > step 1. the running vcpu is be preempted > --> vmx_vcpu_pi_put [ SET pi.sn ] > step 2. hot-unplug the assigned devices > step 3. the vcpu is scheduled in > --> vmx_vcpu_pi_load [ CLEAR pi.sn ] > step 4. the running vcpu is be preempted again > --> vmx_vcpu_pi_put [ direct return ] > step 5. the vcpu is migrated to another pcpu > step 6. the vcpu is scheduled in > --> vmx_vcpu_pi_load [ above check fails and > continue to execute the follow parts ] > > I think vmx_vcpu_pi_load should return direct in step6, because > vmx_vcpu_pi_put in step4 did nothing. > So maybe the above check has a potential problem. Oh! Sorry, please just ignore the above. I made a mistaken. > > Please kindly figure out if I misunderstand anything important :) > > -- > Regards, > Longpeng(Mike) > >> + >> + /* >> + * First handle the simple case where no cmpxchg is necessary; just >> + * allow posting non-urgent interrupts. >> + * >> + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change >> + * PI.NDST: pi_post_block will do it for us and the wakeup_handler >> + * expects the VCPU to be on the blocked_vcpu_list that matches >> + * PI.NDST. >> + */ >> + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || >> + vcpu->cpu == cpu) { >> + pi_clear_sn(pi_desc); >> return; >> + } >> >> + /* The full case. */ >> do { >> old.control = new.control = pi_desc->control; >> >> - /* >> - * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there >> - * are two possible cases: >> - * 1. After running 'pre_block', context switch >> - * happened. For this case, 'sn' was set in >> - * vmx_vcpu_put(), so we need to clear it here. >> - * 2. After running 'pre_block', we were blocked, >> - * and woken up by some other guy. For this case, >> - * we don't need to do anything, 'pi_post_block' >> - * will do everything for us. However, we cannot >> - * check whether it is case #1 or case #2 here >> - * (maybe, not needed), so we also clear sn here, >> - * I think it is not a big deal. >> - */ >> - if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) { >> - if (vcpu->cpu != cpu) { >> - dest = cpu_physical_id(cpu); >> - >> - if (x2apic_enabled()) >> - new.ndst = dest; >> - else >> - new.ndst = (dest << 8) & 0xFF00; >> - } >> + dest = cpu_physical_id(cpu); >> >> - /* set 'NV' to 'notification vector' */ >> - new.nv = POSTED_INTR_VECTOR; >> - } >> + if (x2apic_enabled()) >> + new.ndst = dest; >> + else >> + new.ndst = (dest << 8) & 0xFF00; >> >> - /* Allow posting non-urgent interrupts */ >> new.sn = 0; >> } while (cmpxchg(&pi_desc->control, old.control, >> new.control) != old.control); >> @@ -9259,6 +9257,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) >> >> vmx->msr_ia32_feature_control_valid_bits = FEATURE_CONTROL_LOCKED; >> >> + /* >> + * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR >> + * or POSTED_INTR_WAKEUP_VECTOR. >> + */ >> + vmx->pi_desc.nv = POSTED_INTR_VECTOR; >> + vmx->pi_desc.sn = 1; >> + >> return &vmx->vcpu; >> >> free_vmcs: >> @@ -11249,9 +11254,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) >> else >> new.ndst = (dest << 8) & 0xFF00; >> >> - /* Allow posting non-urgent interrupts */ >> - new.sn = 0; >> - >> /* set 'NV' to 'notification vector' */ >> new.nv = POSTED_INTR_VECTOR; >> } while (cmpxchg(&pi_desc->control, old.control, > > -- Regards, Longpeng(Mike)