RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

"Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> · Mon, 25 Feb 2013 06:55:58 +0000



Marcelo Tosatti wrote on 2013-02-25:
> On Sun, Feb 24, 2013 at 01:17:59PM +0000, Zhang, Yang Z wrote:
>> Marcelo Tosatti wrote on 2013-02-24:
>>> On Sat, Feb 23, 2013 at 03:16:11PM +0000, Zhang, Yang Z wrote:
>>>> Marcelo Tosatti wrote on 2013-02-23:
>>>>> On Sat, Feb 23, 2013 at 02:05:28PM +0000, Zhang, Yang Z wrote:
>>>>>> Marcelo Tosatti wrote on 2013-02-23:
>>>>>>> 
>>>>>>> On Fri, Feb 22, 2013 at 09:42:32PM +0800, Yang Zhang wrote:
>>>>>>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx>
>>>>>>>> 
>>>>>>>> Posted Interrupt allows APIC interrupts to inject into guest directly
>>>>>>>> without any vmexit.
>>>>>>>> 
>>>>>>>> - When delivering a interrupt to guest, if target vcpu is running,
>>>>>>>>   update Posted-interrupt requests bitmap and send a notification
>>>>>>>>   event to the vcpu. Then the vcpu will handle this interrupt
>>>>>>>>   automatically, without any software involvemnt. - If target vcpu is
>>>>>>>>   not running or there already a notification event pending in the
>>>>>>>>   vcpu, do nothing. The interrupt will be handled by next vm entry
>>>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
>>>>>>>> ---
>>>>>>>> 
>>>>>>>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
>>>>>>>> index e4595f1..64616cc 100644
>>>>>>>> --- a/arch/x86/kernel/irq.c
>>>>>>>> +++ b/arch/x86/kernel/irq.c
>>>>>>>> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs
> *regs)
>>>>>>>>  	set_irq_regs(old_regs);
>>>>>>>>  }
>>>>>>>> +#ifdef CONFIG_HAVE_KVM
>>>>>>>> +/*
>>>>>>>> + * Handler for POSTED_INTERRUPT_VECTOR.
>>>>>>>> + */
>>>>>>>> +void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
>>>>>>>> +{
>>>>>>>> +	struct pt_regs *old_regs = set_irq_regs(regs);
>>>>>>>> +
>>>>>>>> +	ack_APIC_irq();
>>>>>>>> +
>>>>>>>> +	irq_enter();
>>>>>>>> +
>>>>>>>> +	exit_idle();
>>>>>>>> +
>>>>>>>> +	irq_exit();
>>>>>>>> +
>>>>>>>> +	set_irq_regs(old_regs);
>>>>>>>> +}
>>>>>>>> +#endif
>>>>>>>> +
>>>>>>> 
>>>>>>> Add per-cpu counters, similarly to other handlers in the same file.
>>>>>> Sure.
>>>>>> 
>>>>>>>> @@ -379,6 +398,7 @@ static inline int apic_find_highest_irr(struct
>>> kvm_lapic
>>>>>>> *apic)
>>>>>>>>  	if (!apic->irr_pending) 		return -1;
>>>>>>>>  +	kvm_x86_ops->sync_pir_to_irr(apic->vcpu); 	result =
>>>>>>>>  apic_search_irr(apic); 	ASSERT(result == -1 || result >= 16);
>>>>>>> 
>>>>>>> kvm_x86_ops->sync_pir_to_irr() should be called from
>>>>>>> vcpu_enter_guest, before inject_pending_event.
>>>>>>> 
>>>>>>>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) ||
> req_int_win)
>>> {
>>>>>>> 		->
>>>>>>> 		inject_pending_event
>>>>>>> 		...
>>>>>>> 	}
>>>>>> Some other places will search irr to do something, like
>>>>> kvm_cpu_has_interrupt(). So we must to sync pir to irr before touch
>>>>> irr, not just before enter guest.
>>>>> 
>>>>> I see. The only call site that needs IRR/PIR information with posted
>>>>> interrupt enabled is kvm_arch_vcpu_runnable, correct?
>>>> Yes.
>>>> 
>>>>> If so, can we contain reading PIR to only that location?
>>>> Yes, we can do it. Actually, why I do pir->irr here is to avoid the
>>>> wrong usage in future of check pending interrupt just by call
>>>> kvm_vcpu_has_interrupt().
>>> 
>>> Yes, i see.
>>> 
>>>> Also, there may need an extra callback to check pir.
>>>> So you think your suggestions is better?
>>> 
>>> Because it respects standard request processing mechanism, yes.
>> Sure. Will change it in next patch.
>> 
>>>>> And have PIR->IRR sync at KVM_REQ_EVENT processing only (to respect
>>>>> the standard way of event processing and also reduce reading the
>>>>> PIR).
>>>> Since we will check ON bit before each reading PIR, the cost can be
>>>> ignored.
>>> 
>>> Note reading ON bit is not OK, because if injection path did not see
>>> vcpu->mode == IN_GUEST_MODE, ON bit will not be set.
>> In my patch, It will set ON bit before check vcpu->mode. So it's ok.
>> Actually, I think the ON bit must be set unconditionally after change
>> PIR regardless of vcpu->mode.
>> 
>>>>> 
>>>>> So it is possible that a PIR IPI is delivered and handled before guest
>>>>> entry.
>>>>> 
>>>>> By clearing PIR notification bit after local_irq_disable, but before the
>>>>> last check for vcpu->requests, we guarantee that a PIR IPI is never lost.
>>>>> 
>>>>> Makes sense? (please check the logic, it might be flawed).
>>>> I am not sure whether I understand you. But I don't think the IPI will
>>>> lost in current patch:
>>>> 
>>>> if (!pi_test_and_set_on(&vmx->pi_desc) && (vcpu->mode ==
>>> IN_GUEST_MODE)) {
>>>>                 kvm_make_request(KVM_REQ_EVENT, vcpu);
>>>>                 apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
>>>>                                     POSTED_INTR_VECTOR);
>>>>                 *delivered = true;
>>>> }
>>>> 
>>>> vcpu entry has:
>>>> vcpu->mode = GUEST_MODE
>>>> local irq disable
>>>> check request
>>>> 
>>>> We will send the IPI after making request, if IPI is received after set
>>> guest_mode before disable irq, then it still will be handled by the
>>> following check request.
>>>> Please correct me if I am wrong.
> 
> p1)
> 
>>> cpu0					cpu1		vcpu0
>>> test_and_set_bit(PIR-A)
>>> set KVM_REQ_EVENT
>>> 							process REQ_EVENT
>>> 							PIR-A->IRR
>>> 
>>> 							vcpu->mode=IN_GUEST
>>> 
>>> if (vcpu0->guest_mode)
>>> 	if (!t_a_s_bit(PIR notif))
>>> 		send IPI
>>> 							linux_pir_handler
>>> 
>>> 					t_a_s_b(PIR-B)=1
>>> 					no PIR IPI sent
> 
> p2)
> 
>> No, this exists with your implementation not mine.
>> As I said, in this patch, it will make request after vcpu ==guest mode, then send
> ipi. Even the ipi is lost, but the request still will be tracked. Because we have the
> follow check:
>> vcpu->mode = GUEST_MODE
>> (ipi may arrived here and lost)
>> local irq disable
>> check request (this will ensure the pir modification will be picked up by vcpu
> before vmentry)
> 
> Please read the sequence above again, between p1) and p2). Yes, if the
> IPI is lost, the request _for the CPU which sent PIR IPI_ is guaranteed
> to be processed, but not the request for another cpu (cpu1).
If no PIR IPI sent in cpu1, the delivered is false, and it will roll back to old logic:
__apic_accept_irq():
if (!delivered) {
             kvm_make_request(KVM_REQ_EVENT, vcpu);
             kvm_vcpu_kick(vcpu);
}
This can solve the problem you mentioned above. Right?

> If in fact the scenario is not possible, then please point out between
> p1) and p2) where the error in representation is.
>
> Note there are 3 contexes: CPU0, CPU1, VCPU0 (virtual CPU on some CPU != 0
> and !=
> 1).

>>>>> Sync PIR->IRR uses xchg, which is locked and atomic, so can't see the
>>>>> need for the spinlock there.
>>>> In function sync_pir_to_irr():
>>>> tmp_pir=xchg(pir,0)
>>>> xchg(tmp_pir, irr)
>>>> 
>>>> It is not atomically, the lock still is needed.
>>> 
>>> IRR is only accessed by local vcpu context, only PIR is accessed concurrently,
>>> therefore only PIR accesses must be atomic. xchg instruction is
>>> locked and atomic.
>> There has same problem as we discussed before. Consider this:
>> Before delivery poste ipi, the irr is 0. then:
>> 
>> cpu0                           cpu1  vcpu0
>> delivery_posted_intr()
>>                                sync_pir_toirr():
>>                                   tmp_pir=xchg(pir,0)(pir is cleared)
>> check pir(pass)
>> check irr(pass)
>> 							   xchg(tmp_pir, irr)
>> injected= true
>> set pir
>> 
>> Same problem: both pir and irr is set, but it reported as injected. Still need the
> lock to protect it.
> 
> See:
> 
> cpu0
> 
> check pir(pass)
> check irr(pass)
> injected = !test_and_set_bit(pir)
> 
> versus
> 
> cpu1
>xchg(pir)
> 
> No information can be lost because all accesses to shared data are
> atomic.
Sorry, I cannot get your point. Why 'xchg(pir)' can solve the problem I mentioned above? Can you elaborate it?
The spinlock I used is trying to protect the whole procedure of sync pir to irr, not just access pir.

Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html