Marcelo Tosatti wrote on 2013-02-23: > On Sat, Feb 23, 2013 at 02:05:28PM +0000, Zhang, Yang Z wrote: >> Marcelo Tosatti wrote on 2013-02-23: >>> >>> On Fri, Feb 22, 2013 at 09:42:32PM +0800, Yang Zhang wrote: >>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx> >>>> >>>> Posted Interrupt allows APIC interrupts to inject into guest directly >>>> without any vmexit. >>>> >>>> - When delivering a interrupt to guest, if target vcpu is running, >>>> update Posted-interrupt requests bitmap and send a notification >>>> event to the vcpu. Then the vcpu will handle this interrupt >>>> automatically, without any software involvemnt. - If target vcpu is >>>> not running or there already a notification event pending in the >>>> vcpu, do nothing. The interrupt will be handled by next vm entry >>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx> >>>> --- >>>> >>>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c >>>> index e4595f1..64616cc 100644 >>>> --- a/arch/x86/kernel/irq.c >>>> +++ b/arch/x86/kernel/irq.c >>>> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs) >>>> set_irq_regs(old_regs); >>>> } >>>> +#ifdef CONFIG_HAVE_KVM >>>> +/* >>>> + * Handler for POSTED_INTERRUPT_VECTOR. >>>> + */ >>>> +void smp_kvm_posted_intr_ipi(struct pt_regs *regs) >>>> +{ >>>> + struct pt_regs *old_regs = set_irq_regs(regs); >>>> + >>>> + ack_APIC_irq(); >>>> + >>>> + irq_enter(); >>>> + >>>> + exit_idle(); >>>> + >>>> + irq_exit(); >>>> + >>>> + set_irq_regs(old_regs); >>>> +} >>>> +#endif >>>> + >>> >>> Add per-cpu counters, similarly to other handlers in the same file. >> Sure. >> >>>> @@ -379,6 +398,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic >>> *apic) >>>> if (!apic->irr_pending) return -1; >>>> + kvm_x86_ops->sync_pir_to_irr(apic->vcpu); result = >>>> apic_search_irr(apic); ASSERT(result == -1 || result >= 16); >>> >>> kvm_x86_ops->sync_pir_to_irr() should be called from vcpu_enter_guest, >>> before inject_pending_event. >>> >>> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) { >>> -> >>> inject_pending_event >>> ... >>> } >> Some other places will search irr to do something, like > kvm_cpu_has_interrupt(). So we must to sync pir to irr before touch irr, not just > before enter guest. > > I see. The only call site that needs IRR/PIR information with posted > interrupt enabled is kvm_arch_vcpu_runnable, correct? Yes. > If so, can we contain reading PIR to only that location? Yes, we can do it. Actually, why I do pir->irr here is to avoid the wrong usage in future of check pending interrupt just by call kvm_vcpu_has_interrupt().Also, there may need an extra callback to check pir. So you think your suggestions is better? > And have PIR->IRR sync at KVM_REQ_EVENT processing only (to respect the > standard way of event processing and also reduce reading the PIR). Since we will check ON bit before each reading PIR, the cost can be ignored. >>>> +{ >>>> + __set_bit(vector, (unsigned long *)pi_desc->pir); >>>> +} >>> >>> This must be locked atomic operation (set_bit). >> If using spinlock, pi_set_pir is not called concurrently. It safe to use _set_bit. > > The manual demands atomic locked accesses (remember this a remote > access). See the posted interrupt page. You are right. >>> >>> Don't see why the lock is necessary. Please use the following >>> pseudocode for vmx_deliver_posted_interrupt: >>> >>> vmx_deliver_posted_interrupt(), returns 'bool injected'. >>> >>> 1. orig_irr = read irr from vapic page >>> 2. if (orig_irr != 0) >>> 3. return false; >>> 4. kvm_make_request(KVM_REQ_EVENT) >>> 5. bool injected = !test_and_set_bit(PIR) >>> 6. if (vcpu->guest_mode && injected) >>> 7. if (test_and_set_bit(PIR notification bit)) >>> 8. send PIR IPI >>> 9. return injected >> >> Consider follow case: >> vcpu 0 | vcpu1 >> send intr to vcpu1 >> check irr >> receive a posted intr >> pir->irr(pir is cleared, irr is set) >> injected=test_and_set_bit=true >> pir=set >> >> Then both irr and pir have the interrupt pending, they may merge to one later, > but injected reported as true. Wrong. > > True. Have a lock around it and at step 1 also verify if PIR is set. That > would do it, yes? Yes. >>> On vcpu_enter_guest: >>> >>> if (kvm_check_request(KVM_REQ_EVENT)) { >>> pir->virr sync (*) >>> ... >>> } >>> ... >>> vcpu->mode = IN_GUEST_MODE; >>> local_irq_disable >>> clear pir notification bit unconditionally (*) >> Why clear ON bit here? If there is no pir->irr operation in this vmentry, clear on > here is redundant. > > A PIR IPI must not be lost. Because if its lost, then interrupt > injection can be delayed while it must be performed immediately. > > vcpu entry path has: > 1. vcpu->mode = GUEST_MODE > 2. local_irq_disable > > injection path has: > 1. if (vcpu->guest_mode && test_and_set_bit(PIR notif bit)) > send IPI > > So it is possible that a PIR IPI is delivered and handled before guest > entry. > > By clearing PIR notification bit after local_irq_disable, but before the > last check for vcpu->requests, we guarantee that a PIR IPI is never lost. > > Makes sense? (please check the logic, it might be flawed). I am not sure whether I understand you. But I don't think the IPI will lost in current patch: if (!pi_test_and_set_on(&vmx->pi_desc) && (vcpu->mode == IN_GUEST_MODE)) { kvm_make_request(KVM_REQ_EVENT, vcpu); apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), POSTED_INTR_VECTOR); *delivered = true; } vcpu entry has: vcpu->mode = GUEST_MODE local irq disable check request We will send the IPI after making request, if IPI is received after set guest_mode before disable irq, then it still will be handled by the following check request. Please correct me if I am wrong. >>> >>> Please confirm whether a spinlock is necessary with the pseudocode above. >> In current implementation, spinlock is necessary to avoid race condition > between vcpus when delivery posted interrupt and sync pir->irr. > > Sync PIR->IRR uses xchg, which is locked and atomic, so can't see the > need for the spinlock there. In function sync_pir_to_irr(): tmp_pir=xchg(pir,0) xchg(tmp_pir, irr) It is not atomically, the lock still is needed. > About serializing concurrent injectors, yes, you're right. OK then. > > Please use the pseucode with a spinlock (the order of setting > KVM_REQ_EVENT etc must be there). > >>>> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu) >>>> +{ >>>> + struct vcpu_vmx *vmx = to_vmx(vcpu); >>>> + unsigned long flags; >>>> + >>>> + if (!vmx_vm_has_apicv(vcpu->kvm)) >>>> + return; >>>> + >>>> + spin_lock_irqsave(&vmx->pi_lock, flags); >>> >>> >>>> + if (!pi_test_and_clear_on(&vmx->pi_desc)) { >>> >>> This needs to move to vcpu_enter_guest, after local_irq_disabled, >>> unconditionally, as noted. >>> >>>> @@ -2679,6 +2679,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) >>>> static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, >>>> >>>> struct kvm_lapic_state *s) { + kvm_x86_ops->sync_pir_to_irr(vcpu); >>>> memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s); >>>> >>>> return 0; >>>> -- >>>> 1.7.1 >>> >> >> >> Best regards, >> Yang >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html