RE: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support

"Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> · Thu, 6 Dec 2012 02:55:16 +0000



Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-05:
>>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-05:
>>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>>>>>> Gleb Natapov wrote on 2012-12-04:
>>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>>>>>> Gleb Natapov wrote on 2012-12-03:
>>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>>>>>> some special awareness into existing interrupr injection path:
>>>>>>>>>> 
>>>>>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>>>>>   update architecture specific indicators before resuming to
>>>>>>>>>>   guest. - A pending interrupt, which is masked by ISR, should
>>>>>>>>>>   be also considered in above update action, since hardware
>>>>>>>>>>   will decide when to inject it at right time. Current
>>>>>>>>>>   has_interrupt and get_interrupt only returns a valid vector
>>>>>>>>>>   from injection p.o.v.
>>>>>>>>> Most of my previous comments still apply.
>>>>>>>>> 
>>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>>>>>> +		int trig_mode, int always_set)
>>>>>>>>>> +{
>>>>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>>>>>> +					trig_mode, always_set);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>  /*
>>>>>>>>>>   * Add a pending IRQ into lapic.
>>>>>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
>>>>> *apic,
>>>>>>> int
>>>>>>>>> delivery_mode,
>>>>>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>>>>>  			break;
>>>>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>>>>>> reload the bitmap.
>>>>>>>> It is too complicated. When program ioapic entry, we cannot get the
>>> target
>>>>> vcpu
>>>>>>> easily. We need to read destination format register and logical
>>>>>>> destination register to find out target vcpu if using logical mode.
>>>>>>> Also, we must trap every modification to the two registers to update
>>>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
>>>>>>> for the vector
>>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
>>>>> ensure all vectors have same trigger mode. And what's worse, the
>>>>> vector in another vcpu is used to handle high frequency
>>>>> interrupts(like 10G NIC), then it will hurt performance.
>>>>>> 
>>>>> I never saw OSes reuse vector used by ioapic, as far as I see this
>>>> Could you point out which code does this check in Linux kernel? I don't
>>>> see any special checks when Linux kernel allocates a vector.
>>>> 
>>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
>>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all
>>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
>>> workaround this behaviour and allocated vector per cpu, no?
>> Yes, when create an new irq, it will allocate vector from all online cpus. But after
> user changes the irq affinity, then the vector will reallocate with new cpumask.
> And this will leave the vector available on other cpus.
>> 
> Since during vector allocation all cpus are checked vector will not be
> reused if it is allocated on any cpu.
Sorry, I still cannot find this check in kernel. Can you point me out to it?
Also, I do see the vector reused by MSI and IOAPIC in my system.
 
>>> Are you aware of any guest that I can run, examine ioapic/apic
>>> configuration and see that the same vector is used on different vcpus
>>> for different devices? Can you point me to it?
>>> 
> Can you answer this?
I am sure Xen will reused the IOAPIC vector.

>>>>> is not how Linux code works. Furthermore it will not work with KVM
>>>>> currently since apic eoi redirected to ioapic based on vector alone,
>>>>> not vector/vcpu pair and as far as I am aware this is how real HW works.
>>>> yes, real HW works in this way. But why it is helpful in this case?
>>> It makes it impossible to use the same vector for different devices on
>>> different cpus if the vector is delivered to at least one cpu through
>>> ioapic. It may cause spurious interrupts, it will bring havoc to our
>>> ack notifiers (albeit this is KVM's implementation problem). Also look
>>> at various comment in arch/x86/kernel/apic/io_apic.c, it looks like
>>> ioapics tend to misbehave if you look at them funny. Who knows what
>>> troubles EOIing the same vector twice on real HW may bring.
>>> 
>>>> 
>>>>>>> programmed into ioapic. Which two registers? All accesses to ioapic are
>>>>>>> trapped and reconfiguration is rare.
>>>>>> In logical mode, the destination VCPU is depend on each CPU's
> destination
>>>>> format register and logical destination register. So we must also
>>>>> trap the two registers.
>>>>>> And if it uses lowest priority delivery mode, the PPR need to be trapped
> too.
>>>>> Since PPR will change on each interrupt injection, the cost should be
>>>>> higher than current approach. No need for all of that if bitmask it
>>>>> global.
>>>> No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
>>> Make in global. Why what will work?
>>> 
>>> And we need to trap format/logical destination/id registers anyway since
>>> we need to build kvm->arch.apic_map table that is used to deliver
>>> interrupts. BTW you can use this table to build per VCPU eoi bitmask
>>> too, but I am not convinced it is needed in practice.
>> Even KVM uses a simple way to implement the lowest priority delivery mode,
> we still need to trap all interrupts that use the lowest priority delivery mode.
> Because each interrupt will change CPU's priority and we need to recalculate the
> priority and iterate the whole ioapic entry to renew the eoi exiting bitmap. The
> cost should be worse than current way. I don't think it worth to do.
>> 
> Just set the bit on every vcpu that can get interrupt with lowest priority.
> 
> --
> 			Gleb.


Best regards,
Yang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html