Re: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

Marcelo Tosatti <mtosatti@xxxxxxxxxx> · Sun, 24 Feb 2013 14:39:02 -0300

On Sun, Feb 24, 2013 at 01:17:59PM +0000, Zhang, Yang Z wrote:
> Marcelo Tosatti wrote on 2013-02-24:
> > On Sat, Feb 23, 2013 at 03:16:11PM +0000, Zhang, Yang Z wrote:
> >> Marcelo Tosatti wrote on 2013-02-23:
> >>> On Sat, Feb 23, 2013 at 02:05:28PM +0000, Zhang, Yang Z wrote:
> >>>> Marcelo Tosatti wrote on 2013-02-23:
> >>>>> 
> >>>>> On Fri, Feb 22, 2013 at 09:42:32PM +0800, Yang Zhang wrote:
> >>>>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx>
> >>>>>> 
> >>>>>> Posted Interrupt allows APIC interrupts to inject into guest directly
> >>>>>> without any vmexit.
> >>>>>> 
> >>>>>> - When delivering a interrupt to guest, if target vcpu is running,
> >>>>>>   update Posted-interrupt requests bitmap and send a notification
> >>>>>>   event to the vcpu. Then the vcpu will handle this interrupt
> >>>>>>   automatically, without any software involvemnt. - If target vcpu is
> >>>>>>   not running or there already a notification event pending in the
> >>>>>>   vcpu, do nothing. The interrupt will be handled by next vm entry
> >>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
> >>>>>> ---
> >>>>>> 
> >>>>>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> >>>>>> index e4595f1..64616cc 100644
> >>>>>> --- a/arch/x86/kernel/irq.c
> >>>>>> +++ b/arch/x86/kernel/irq.c
> >>>>>> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
> >>>>>>  	set_irq_regs(old_regs);
> >>>>>>  }
> >>>>>> +#ifdef CONFIG_HAVE_KVM
> >>>>>> +/*
> >>>>>> + * Handler for POSTED_INTERRUPT_VECTOR.
> >>>>>> + */
> >>>>>> +void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
> >>>>>> +{
> >>>>>> +	struct pt_regs *old_regs = set_irq_regs(regs);
> >>>>>> +
> >>>>>> +	ack_APIC_irq();
> >>>>>> +
> >>>>>> +	irq_enter();
> >>>>>> +
> >>>>>> +	exit_idle();
> >>>>>> +
> >>>>>> +	irq_exit();
> >>>>>> +
> >>>>>> +	set_irq_regs(old_regs);
> >>>>>> +}
> >>>>>> +#endif
> >>>>>> +
> >>>>> 
> >>>>> Add per-cpu counters, similarly to other handlers in the same file.
> >>>> Sure.
> >>>> 
> >>>>>> @@ -379,6 +398,7 @@ static inline int apic_find_highest_irr(struct
> > kvm_lapic
> >>>>> *apic)
> >>>>>>  	if (!apic->irr_pending) 		return -1;
> >>>>>>  +	kvm_x86_ops->sync_pir_to_irr(apic->vcpu); 	result =
> >>>>>>  apic_search_irr(apic); 	ASSERT(result == -1 || result >= 16);
> >>>>> 
> >>>>> kvm_x86_ops->sync_pir_to_irr() should be called from vcpu_enter_guest,
> >>>>> before inject_pending_event.
> >>>>> 
> >>>>>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win)
> > {
> >>>>> 		->
> >>>>> 		inject_pending_event
> >>>>> 		...
> >>>>> 	}
> >>>> Some other places will search irr to do something, like
> >>> kvm_cpu_has_interrupt(). So we must to sync pir to irr before touch
> >>> irr, not just before enter guest.
> >>> 
> >>> I see. The only call site that needs IRR/PIR information with posted
> >>> interrupt enabled is kvm_arch_vcpu_runnable, correct?
> >> Yes.
> >> 
> >>> If so, can we contain reading PIR to only that location?
> >> Yes, we can do it. Actually, why I do pir->irr here is to avoid the
> >> wrong usage in future of check pending interrupt just by call
> >> kvm_vcpu_has_interrupt().
> > 
> > Yes, i see.
> > 
> >> Also, there may need an extra callback to check pir.
> >> So you think your suggestions is better?
> > 
> > Because it respects standard request processing mechanism, yes.
> Sure. Will change it in next patch.
>  
> >>> And have PIR->IRR sync at KVM_REQ_EVENT processing only (to respect the
> >>> standard way of event processing and also reduce reading the PIR).
> >> Since we will check ON bit before each reading PIR, the cost can be
> >> ignored.
> > 
> > Note reading ON bit is not OK, because if injection path did not see
> > vcpu->mode == IN_GUEST_MODE, ON bit will not be set.
> In my patch, It will set ON bit before check vcpu->mode. So it's ok.
> Actually, I think the ON bit must be set unconditionally after change PIR regardless of vcpu->mode.
>  
> >>> 
> >>> So it is possible that a PIR IPI is delivered and handled before guest
> >>> entry.
> >>> 
> >>> By clearing PIR notification bit after local_irq_disable, but before the
> >>> last check for vcpu->requests, we guarantee that a PIR IPI is never lost.
> >>> 
> >>> Makes sense? (please check the logic, it might be flawed).
> >> I am not sure whether I understand you. But I don't think the IPI will
> >> lost in current patch:
> >> 
> >> if (!pi_test_and_set_on(&vmx->pi_desc) && (vcpu->mode ==
> > IN_GUEST_MODE)) {
> >>                 kvm_make_request(KVM_REQ_EVENT, vcpu);
> >>                 apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
> >>                                     POSTED_INTR_VECTOR);
> >>                 *delivered = true;
> >> }
> >> 
> >> vcpu entry has:
> >> vcpu->mode = GUEST_MODE
> >> local irq disable
> >> check request
> >> 
> >> We will send the IPI after making request, if IPI is received after set
> > guest_mode before disable irq, then it still will be handled by the following check
> > request.
> >> Please correct me if I am wrong.

p1)

> > cpu0					cpu1		vcpu0
> > test_and_set_bit(PIR-A)
> > set KVM_REQ_EVENT
> > 							process REQ_EVENT
> > 							PIR-A->IRR
> > 
> > 							vcpu->mode=IN_GUEST
> > 
> > if (vcpu0->guest_mode)
> > 	if (!t_a_s_bit(PIR notif))
> > 		send IPI
> > 							linux_pir_handler
> > 
> > 					t_a_s_b(PIR-B)=1
> > 					no PIR IPI sent

p2)

> No, this exists with your implementation not mine.
> As I said, in this patch, it will make request after vcpu ==guest mode, then send ipi. Even the ipi is lost, but the request still will be tracked. Because we have the follow check:
> vcpu->mode = GUEST_MODE
> (ipi may arrived here and lost)
> local irq disable
> check request (this will ensure the pir modification will be picked up by vcpu before vmentry)

Please read the sequence above again, between p1) and p2). Yes, if the
IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed
to be processed, but not the request for another cpu (cpu1).

If in fact the scenario is not possible, then please point out between
p1) and p2) where the error in representation is.

Note there are 3 contexes: CPU0, CPU1, VCPU0 (virtual CPU on some CPU != 0 and !=
1).

> >>> Sync PIR->IRR uses xchg, which is locked and atomic, so can't see the
> >>> need for the spinlock there.
> >> In function sync_pir_to_irr():
> >> tmp_pir=xchg(pir,0)
> >> xchg(tmp_pir, irr)
> >> 
> >> It is not atomically, the lock still is needed.
> > 
> > IRR is only accessed by local vcpu context, only PIR is accessed concurrently,
> > therefore only PIR accesses must be atomic. xchg instruction is
> > locked and atomic.
> There has same problem as we discussed before. Consider this:
> Before delivery poste ipi, the irr is 0. then:
> 
> cpu0                           cpu1  vcpu0
> delivery_posted_intr()
>                                sync_pir_toirr():
>                                   tmp_pir=xchg(pir,0)(pir is cleared)
> check pir(pass)
> check irr(pass)
> 							   xchg(tmp_pir, irr)
> injected= true
> set pir  
> 
> Same problem: both pir and irr is set, but it reported as injected. Still need the lock to protect it.                                

See:

cpu0

check pir(pass)
check irr(pass)
injected = !test_and_set_bit(pir)

versus 

cpu1 
xchg(pir)

No information can be lost because all accesses to shared data are
atomic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html