RE: [PATCH v4 2/2] KVM: VMX: Add Posted Interrupt supporting

"Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> · Sat, 23 Feb 2013 15:16:11 +0000

Marcelo Tosatti wrote on 2013-02-23:
> On Sat, Feb 23, 2013 at 02:05:28PM +0000, Zhang, Yang Z wrote:
>> Marcelo Tosatti wrote on 2013-02-23:
>>> 
>>> On Fri, Feb 22, 2013 at 09:42:32PM +0800, Yang Zhang wrote:
>>>> From: Yang Zhang <yang.z.zhang@xxxxxxxxx>
>>>> 
>>>> Posted Interrupt allows APIC interrupts to inject into guest directly
>>>> without any vmexit.
>>>> 
>>>> - When delivering a interrupt to guest, if target vcpu is running,
>>>>   update Posted-interrupt requests bitmap and send a notification
>>>>   event to the vcpu. Then the vcpu will handle this interrupt
>>>>   automatically, without any software involvemnt. - If target vcpu is
>>>>   not running or there already a notification event pending in the
>>>>   vcpu, do nothing. The interrupt will be handled by next vm entry
>>>> Signed-off-by: Yang Zhang <yang.z.zhang@xxxxxxxxx>
>>>> ---
>>>> 
>>>> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
>>>> index e4595f1..64616cc 100644
>>>> --- a/arch/x86/kernel/irq.c
>>>> +++ b/arch/x86/kernel/irq.c
>>>> @@ -228,6 +228,26 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
>>>>  	set_irq_regs(old_regs);
>>>>  }
>>>> +#ifdef CONFIG_HAVE_KVM
>>>> +/*
>>>> + * Handler for POSTED_INTERRUPT_VECTOR.
>>>> + */
>>>> +void smp_kvm_posted_intr_ipi(struct pt_regs *regs)
>>>> +{
>>>> +	struct pt_regs *old_regs = set_irq_regs(regs);
>>>> +
>>>> +	ack_APIC_irq();
>>>> +
>>>> +	irq_enter();
>>>> +
>>>> +	exit_idle();
>>>> +
>>>> +	irq_exit();
>>>> +
>>>> +	set_irq_regs(old_regs);
>>>> +}
>>>> +#endif
>>>> +
>>> 
>>> Add per-cpu counters, similarly to other handlers in the same file.
>> Sure.
>> 
>>>> @@ -379,6 +398,7 @@ static inline int apic_find_highest_irr(struct kvm_lapic
>>> *apic)
>>>>  	if (!apic->irr_pending) 		return -1;
>>>>  +	kvm_x86_ops->sync_pir_to_irr(apic->vcpu); 	result =
>>>>  apic_search_irr(apic); 	ASSERT(result == -1 || result >= 16);
>>> 
>>> kvm_x86_ops->sync_pir_to_irr() should be called from vcpu_enter_guest,
>>> before inject_pending_event.
>>> 
>>>         if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>> 		->
>>> 		inject_pending_event
>>> 		...
>>> 	}
>> Some other places will search irr to do something, like
> kvm_cpu_has_interrupt(). So we must to sync pir to irr before touch irr, not just
> before enter guest.
> 
> I see. The only call site that needs IRR/PIR information with posted
> interrupt enabled is kvm_arch_vcpu_runnable, correct?
Yes.

> If so, can we contain reading PIR to only that location?
Yes, we can do it. 
Actually, why I do pir->irr here is to avoid the wrong usage in future of check pending interrupt just by call kvm_vcpu_has_interrupt().Also, there may need an extra callback to check pir.
So you think your suggestions is better?

> And have PIR->IRR sync at KVM_REQ_EVENT processing only (to respect the
> standard way of event processing and also reduce reading the PIR).
Since we will check ON bit before each reading PIR, the cost can be ignored.

>>>> +{
>>>> +	__set_bit(vector, (unsigned long *)pi_desc->pir);
>>>> +}
>>> 
>>> This must be locked atomic operation (set_bit).
>> If using spinlock, pi_set_pir is not called concurrently. It safe to use _set_bit.
> 
> The manual demands atomic locked accesses (remember this a remote
> access). See the posted interrupt page.
You are right.

>>> 
>>> Don't see why the lock is necessary. Please use the following
>>> pseudocode for vmx_deliver_posted_interrupt:
>>> 
>>> vmx_deliver_posted_interrupt(), returns 'bool injected'.
>>> 
>>> 1. orig_irr = read irr from vapic page
>>> 2. if (orig_irr != 0)
>>> 3.	return false;
>>> 4. kvm_make_request(KVM_REQ_EVENT)
>>> 5. bool injected = !test_and_set_bit(PIR)
>>> 6. if (vcpu->guest_mode && injected)
>>> 7. 	if (test_and_set_bit(PIR notification bit))
>>> 8.		send PIR IPI
>>> 9. return injected
>> 
>> Consider follow case:
>> vcpu 0                      |                vcpu1
>> send intr to vcpu1
>> check irr
>>                                             receive a posted intr
>> 					      pir->irr(pir is cleared, irr is set)
>> injected=test_and_set_bit=true
>> pir=set
>> 
>> Then both irr and pir have the interrupt pending, they may merge to one later,
> but injected reported as true. Wrong.
> 
> True. Have a lock around it and at step 1 also verify if PIR is set. That
> would do it, yes?
Yes.

>>> On vcpu_enter_guest:
>>> 
>>> if (kvm_check_request(KVM_REQ_EVENT))  {
>>> 	pir->virr sync			(*)
>>> 	...
>>> }
>>> ...
>>> vcpu->mode = IN_GUEST_MODE;
>>> local_irq_disable
>>> clear pir notification bit unconditionally (*)
>> Why clear ON bit here? If there is no pir->irr operation in this vmentry, clear on
> here is redundant.
> 
> A PIR IPI must not be lost. Because if its lost, then interrupt
> injection can be delayed while it must be performed immediately.
> 
> vcpu entry path has:
> 1. vcpu->mode = GUEST_MODE
> 2. local_irq_disable
> 
> injection path has:
> 1. if (vcpu->guest_mode && test_and_set_bit(PIR notif bit))
> 	send IPI
> 
> So it is possible that a PIR IPI is delivered and handled before guest
> entry.
> 
> By clearing PIR notification bit after local_irq_disable, but before the
> last check for vcpu->requests, we guarantee that a PIR IPI is never lost.
> 
> Makes sense? (please check the logic, it might be flawed).
I am not sure whether I understand you. But I don't think the IPI will lost in current patch:

if (!pi_test_and_set_on(&vmx->pi_desc) && (vcpu->mode == IN_GUEST_MODE)) {
                kvm_make_request(KVM_REQ_EVENT, vcpu);
                apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
                                    POSTED_INTR_VECTOR);
                *delivered = true;
}

vcpu entry has: 
vcpu->mode = GUEST_MODE
local irq disable
check request

We will send the IPI after making request, if IPI is received after set guest_mode before disable irq, then it still will be handled by the following check request.
Please correct me if I am wrong.

>>> 
>>> Please confirm whether a spinlock is necessary with the pseudocode above.
>> In current implementation, spinlock is necessary to avoid race condition
> between vcpus when delivery posted interrupt and sync pir->irr.
> 
> Sync PIR->IRR uses xchg, which is locked and atomic, so can't see the
> need for the spinlock there.
In function sync_pir_to_irr():
tmp_pir=xchg(pir,0)
xchg(tmp_pir, irr)

It is not atomically, the lock still is needed.

> About serializing concurrent injectors, yes, you're right. OK then.
> 
> Please use the pseucode with a spinlock (the order of setting
> KVM_REQ_EVENT etc must be there).
> 
>>>> +static void vmx_sync_pir_to_irr(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>> +	unsigned long flags;
>>>> +
>>>> +	if (!vmx_vm_has_apicv(vcpu->kvm))
>>>> +		return;
>>>> +
>>>> +	spin_lock_irqsave(&vmx->pi_lock, flags);
>>> 
>>> 
>>>> +	if (!pi_test_and_clear_on(&vmx->pi_desc)) {
>>> 
>>> This needs to move to vcpu_enter_guest, after local_irq_disabled,
>>> unconditionally, as noted.
>>> 
>>>> @@ -2679,6 +2679,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>  static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
>>>>  
>>>>  struct kvm_lapic_state *s) { +	kvm_x86_ops->sync_pir_to_irr(vcpu);
>>>>  	memcpy(s->regs, vcpu->arch.apic->regs, sizeof *s);
>>>>  
>>>>  	return 0;
>>>> --
>>>> 1.7.1
>>> 
>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html