Marcelo Tosatti wrote on 2013-02-25: > On Sun, Feb 24, 2013 at 01:17:59PM +0000, Zhang, Yang Z wrote: >> Marcelo Tosatti wrote on 2013-02-24: >>> On Sat, Feb 23, 2013 at 03:16:11PM +0000, Zhang, Yang Z wrote: >>>> Marcelo Tosatti wrote on 2013-02-23: >>>>> On Sat, Feb 23, 2013 at 02:05:28PM +0000, Zhang, Yang Z wrote: >>>>>> Marcelo Tosatti wrote on 2013-02-23: >>>>>>> >>>>>>> On Fri, Feb 22, 2013 at 09:42:32PM +0800, Yang Zhang wrote: >>>>>>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx> >>>>>>>> >>>>>>>> Posted Interrupt allows APIC interrupts to inject into guest directly >>>>>>>> without any vmexit. >>>>>>>> >>>>>>>> - When delivering a interrupt to guest, if target vcpu is running, >>>>>>>> update Posted-interrupt requests bitmap and send a notification >>>>>>>> event to the vcpu. Then the vcpu will handle this interrupt >>>>>>>> automatically, without any software involvemnt. - If target vcpu is >>>>>>>> not running or there already a notification event pending in the >>>>>>>> vcpu, do nothing. The interrupt will be handled by next vm entry >>>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx> >>>>>>>> --- >>>>>>>> >>>>>>>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c >>>>>>>> index e4595f1..64616cc 100644 >>>>>>>> --- a/arch/x86/kernel/irq.c >>>>>>>> +++ b/arch/x86/kernel/irq.c >>>>>>>> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs > *regs) >>>>>>>> set_irq_regs(old_regs); >>>>>>>> } >>>>>>>> +#ifdef CONFIG_HAVE_KVM >>>>>>>> +/* >>>>>>>> + * Handler for POSTED_INTERRUPT_VECTOR. >>>>>>>> + */ >>>>>>>> +void smp_kvm_posted_intr_ipi(struct pt_regs *regs) >>>>>>>> +{ >>>>>>>> + struct pt_regs *old_regs = set_irq_regs(regs); >>>>>>>> + >>>>>>>> + ack_APIC_irq(); >>>>>>>> + >>>>>>>> + irq_enter(); >>>>>>>> + >>>>>>>> + exit_idle(); >>>>>>>> + >>>>>>>> + irq_exit(); >>>>>>>> + >>>>>>>> + set_irq_regs(old_regs); >>>>>>>> +} >>>>>>>> +#endif >>>>>>>> + >>>>>>> >>>>>>> Add per-cpu counters, similarly to other handlers in the same file. >>>>>> Sure. >>>>>> >>>>>>>> @@ -379,6 +398,7 @@ static inline int apic_find_highest_irr(struct >>> kvm_lapic >>>>>>> *apic) >>>>>>>> if (!apic->irr_pending) return -1; >>>>>>>> + kvm_x86_ops->sync_pir_to_irr(apic->vcpu); result = >>>>>>>> apic_search_irr(apic); ASSERT(result == -1 || result >= 16); >>>>>>> >>>>>>> kvm_x86_ops->sync_pir_to_irr() should be called from >>>>>>> vcpu_enter_guest, before inject_pending_event. >>>>>>> >>>>>>> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || > req_int_win) >>> { >>>>>>> -> >>>>>>> inject_pending_event >>>>>>> ... >>>>>>> } >>>>>> Some other places will search irr to do something, like >>>>> kvm_cpu_has_interrupt(). So we must to sync pir to irr before touch >>>>> irr, not just before enter guest. >>>>> >>>>> I see. The only call site that needs IRR/PIR information with posted >>>>> interrupt enabled is kvm_arch_vcpu_runnable, correct? >>>> Yes. >>>> >>>>> If so, can we contain reading PIR to only that location? >>>> Yes, we can do it. Actually, why I do pir->irr here is to avoid the >>>> wrong usage in future of check pending interrupt just by call >>>> kvm_vcpu_has_interrupt(). >>> >>> Yes, i see. >>> >>>> Also, there may need an extra callback to check pir. >>>> So you think your suggestions is better? >>> >>> Because it respects standard request processing mechanism, yes. >> Sure. Will change it in next patch. >> >>>>> And have PIR->IRR sync at KVM_REQ_EVENT processing only (to respect >>>>> the standard way of event processing and also reduce reading the >>>>> PIR). >>>> Since we will check ON bit before each reading PIR, the cost can be >>>> ignored. >>> >>> Note reading ON bit is not OK, because if injection path did not see >>> vcpu->mode == IN_GUEST_MODE, ON bit will not be set. >> In my patch, It will set ON bit before check vcpu->mode. So it's ok. >> Actually, I think the ON bit must be set unconditionally after change >> PIR regardless of vcpu->mode. >> >>>>> >>>>> So it is possible that a PIR IPI is delivered and handled before guest >>>>> entry. >>>>> >>>>> By clearing PIR notification bit after local_irq_disable, but before the >>>>> last check for vcpu->requests, we guarantee that a PIR IPI is never lost. >>>>> >>>>> Makes sense? (please check the logic, it might be flawed). >>>> I am not sure whether I understand you. But I don't think the IPI will >>>> lost in current patch: >>>> >>>> if (!pi_test_and_set_on(&vmx->pi_desc) && (vcpu->mode == >>> IN_GUEST_MODE)) { >>>> kvm_make_request(KVM_REQ_EVENT, vcpu); >>>> apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), >>>> POSTED_INTR_VECTOR); >>>> *delivered = true; >>>> } >>>> >>>> vcpu entry has: >>>> vcpu->mode = GUEST_MODE >>>> local irq disable >>>> check request >>>> >>>> We will send the IPI after making request, if IPI is received after set >>> guest_mode before disable irq, then it still will be handled by the >>> following check request. >>>> Please correct me if I am wrong. > > p1) > >>> cpu0 cpu1 vcpu0 >>> test_and_set_bit(PIR-A) >>> set KVM_REQ_EVENT >>> process REQ_EVENT >>> PIR-A->IRR >>> >>> vcpu->mode=IN_GUEST >>> >>> if (vcpu0->guest_mode) >>> if (!t_a_s_bit(PIR notif)) >>> send IPI >>> linux_pir_handler >>> >>> t_a_s_b(PIR-B)=1 >>> no PIR IPI sent > > p2) > >> No, this exists with your implementation not mine. >> As I said, in this patch, it will make request after vcpu ==guest mode, then send > ipi. Even the ipi is lost, but the request still will be tracked. Because we have the > follow check: >> vcpu->mode = GUEST_MODE >> (ipi may arrived here and lost) >> local irq disable >> check request (this will ensure the pir modification will be picked up by vcpu > before vmentry) > > Please read the sequence above again, between p1) and p2). Yes, if the > IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed > to be processed, but not the request for another cpu (cpu1). If no PIR IPI sent in cpu1, the delivered is false, and it will roll back to old logic: __apic_accept_irq(): if (!delivered) { kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } This can solve the problem you mentioned above. Right? > If in fact the scenario is not possible, then please point out between > p1) and p2) where the error in representation is. > > Note there are 3 contexes: CPU0, CPU1, VCPU0 (virtual CPU on some CPU != 0 > and != > 1). >>>>> Sync PIR->IRR uses xchg, which is locked and atomic, so can't see the >>>>> need for the spinlock there. >>>> In function sync_pir_to_irr(): >>>> tmp_pir=xchg(pir,0) >>>> xchg(tmp_pir, irr) >>>> >>>> It is not atomically, the lock still is needed. >>> >>> IRR is only accessed by local vcpu context, only PIR is accessed concurrently, >>> therefore only PIR accesses must be atomic. xchg instruction is >>> locked and atomic. >> There has same problem as we discussed before. Consider this: >> Before delivery poste ipi, the irr is 0. then: >> >> cpu0 cpu1 vcpu0 >> delivery_posted_intr() >> sync_pir_toirr(): >> tmp_pir=xchg(pir,0)(pir is cleared) >> check pir(pass) >> check irr(pass) >> xchg(tmp_pir, irr) >> injected= true >> set pir >> >> Same problem: both pir and irr is set, but it reported as injected. Still need the > lock to protect it. > > See: > > cpu0 > > check pir(pass) > check irr(pass) > injected = !test_and_set_bit(pir) > > versus > > cpu1 >xchg(pir) > > No information can be lost because all accesses to shared data are > atomic. Sorry, I cannot get your point. Why 'xchg(pir)' can solve the problem I mentioned above? Can you elaborate it? The spinlock I used is trying to protect the whole procedure of sync pir to irr, not just access pir. Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html