RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

"Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> · Sun, 24 Feb 2013 13:17:59 +0000

Marcelo Tosatti wrote on 2013-02-24:
> On Sat, Feb 23, 2013 at 03:16:11PM +0000, Zhang, Yang Z wrote:
>> Marcelo Tosatti wrote on 2013-02-23:
>>> On Sat, Feb 23, 2013 at 02:05:28PM +0000, Zhang, Yang Z wrote:
>>>> Marcelo Tosatti wrote on 2013-02-23:
>>>>> 
>>>>> On Fri, Feb 22, 2013 at 09:42:32PM +0800, Yang Zhang wrote:
>>>>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx>
>>>>>> 
>>>>>> Posted Interrupt allows APIC interrupts to inject into guest directly
>>>>>> without any vmexit.
>>>>>> 
>>>>>> - When delivering a interrupt to guest, if target vcpu is running,
>>>>>>   update Posted-interrupt requests bitmap and send a notification
>>>>>>   event to the vcpu. Then the vcpu will handle this interrupt
>>>>>>   automatically, without any software involvemnt. - If target vcpu is
>>>>>>   not running or there already a notification event pending in the
>>>>>>   vcpu, do nothing. The interrupt will be handled by next vm entry
>>>>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
>>>>>> ---
>>>>>> 
>>>>>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
>>>>>> index e4595f1..64616cc 100644
>>>>>> --- a/arch/x86/kernel/irq.c
>>>>>> +++ b/arch/x86/kernel/irq.c
>>>>>> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
>>>>>>  	set_irq_regs(old_regs);
>>>>>>  }
>>>>>> +#ifdef CONFIG_HAVE_KVM
>>>>>> +/*
>>>>>> + * Handler for POSTED_INTERRUPT_VECTOR.
>>>>>> + */
>>>>>> +void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
>>>>>> +{
>>>>>> +	struct pt_regs *old_regs = set_irq_regs(regs);
>>>>>> +
>>>>>> +	ack_APIC_irq();
>>>>>> +
>>>>>> +	irq_enter();
>>>>>> +
>>>>>> +	exit_idle();
>>>>>> +
>>>>>> +	irq_exit();
>>>>>> +
>>>>>> +	set_irq_regs(old_regs);
>>>>>> +}
>>>>>> +#endif
>>>>>> +
>>>>> 
>>>>> Add per-cpu counters, similarly to other handlers in the same file.
>>>> Sure.
>>>> 
>>>>>> @@ -379,6 +398,7 @@ static inline int apic_find_highest_irr(struct
> kvm_lapic
>>>>> *apic)
>>>>>>  	if (!apic->irr_pending) 		return -1;
>>>>>>  +	kvm_x86_ops->sync_pir_to_irr(apic->vcpu); 	result =
>>>>>>  apic_search_irr(apic); 	ASSERT(result == -1 || result >= 16);
>>>>> 
>>>>> kvm_x86_ops->sync_pir_to_irr() should be called from vcpu_enter_guest,
>>>>> before inject_pending_event.
>>>>> 
>>>>>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win)
> {
>>>>> 		->
>>>>> 		inject_pending_event
>>>>> 		...
>>>>> 	}
>>>> Some other places will search irr to do something, like
>>> kvm_cpu_has_interrupt(). So we must to sync pir to irr before touch
>>> irr, not just before enter guest.
>>> 
>>> I see. The only call site that needs IRR/PIR information with posted
>>> interrupt enabled is kvm_arch_vcpu_runnable, correct?
>> Yes.
>> 
>>> If so, can we contain reading PIR to only that location?
>> Yes, we can do it. Actually, why I do pir->irr here is to avoid the
>> wrong usage in future of check pending interrupt just by call
>> kvm_vcpu_has_interrupt().
> 
> Yes, i see.
> 
>> Also, there may need an extra callback to check pir.
>> So you think your suggestions is better?
> 
> Because it respects standard request processing mechanism, yes.
Sure. Will change it in next patch.

>>> And have PIR->IRR sync at KVM_REQ_EVENT processing only (to respect the
>>> standard way of event processing and also reduce reading the PIR).
>> Since we will check ON bit before each reading PIR, the cost can be
>> ignored.
> 
> Note reading ON bit is not OK, because if injection path did not see
> vcpu->mode == IN_GUEST_MODE, ON bit will not be set.
In my patch, It will set ON bit before check vcpu->mode. So it's ok.
Actually, I think the ON bit must be set unconditionally after change PIR regardless of vcpu->mode.

>>> 
>>> So it is possible that a PIR IPI is delivered and handled before guest
>>> entry.
>>> 
>>> By clearing PIR notification bit after local_irq_disable, but before the
>>> last check for vcpu->requests, we guarantee that a PIR IPI is never lost.
>>> 
>>> Makes sense? (please check the logic, it might be flawed).
>> I am not sure whether I understand you. But I don't think the IPI will
>> lost in current patch:
>> 
>> if (!pi_test_and_set_on(&vmx->pi_desc) && (vcpu->mode ==
> IN_GUEST_MODE)) {
>>                 kvm_make_request(KVM_REQ_EVENT, vcpu);
>>                 apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
>>                                     POSTED_INTR_VECTOR);
>>                 *delivered = true;
>> }
>> 
>> vcpu entry has:
>> vcpu->mode = GUEST_MODE
>> local irq disable
>> check request
>> 
>> We will send the IPI after making request, if IPI is received after set
> guest_mode before disable irq, then it still will be handled by the following check
> request.
>> Please correct me if I am wrong.
> 
> cpu0					cpu1		vcpu0
> test_and_set_bit(PIR-A)
> set KVM_REQ_EVENT
> 							process REQ_EVENT
> 							PIR-A->IRR
> 
> 							vcpu->mode=IN_GUEST
> 
> if (vcpu0->guest_mode)
> 	if (!t_a_s_bit(PIR notif))
> 		send IPI
> 							linux_pir_handler
> 
> 					t_a_s_b(PIR-B)=1
> 					no PIR IPI sent
No, this exists with your implementation not mine.
As I said, in this patch, it will make request after vcpu ==guest mode, then send ipi. Even the ipi is lost, but the request still will be tracked. Because we have the follow check:
vcpu->mode = GUEST_MODE
(ipi may arrived here and lost)
local irq disable
check request (this will ensure the pir modification will be picked up by vcpu before vmentry)

>>> 
>>> Sync PIR->IRR uses xchg, which is locked and atomic, so can't see the
>>> need for the spinlock there.
>> In function sync_pir_to_irr():
>> tmp_pir=xchg(pir,0)
>> xchg(tmp_pir, irr)
>> 
>> It is not atomically, the lock still is needed.
> 
> IRR is only accessed by local vcpu context, only PIR is accessed concurrently,
> therefore only PIR accesses must be atomic. xchg instruction is
> locked and atomic.
There has same problem as we discussed before. Consider this:
Before delivery poste ipi, the irr is 0. then:

cpu0                           cpu1  vcpu0
delivery_posted_intr()
                               sync_pir_toirr():
                                  tmp_pir=xchg(pir,0)(pir is cleared)
check pir(pass)
check irr(pass)
							   xchg(tmp_pir, irr)
injected= true
set pir  

Same problem: both pir and irr is set, but it reported as injected. Still need the lock to protect it.                                

> +void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir) +{ +      
> u32 i, pir_val; +       struct kvm_lapic *apic = vcpu->arch.apic; + +   
>    for (i = 0; i <= 7; i++) { +               pir_val = xchg(&pir[i],
> 0); +               if (pir_val) +                       *((u32
> *)(apic->regs + APIC_IRR + i * 0x10)) |= pir_val; +       } +}
> +EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
> 
> Right?
>

Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html