Gleb Natapov wrote on 2012-12-05: > On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote: >> Gleb Natapov wrote on 2012-12-05: >>> On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote: >>>> Gleb Natapov wrote on 2012-12-05: >>>>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote: >>>>>> Gleb Natapov wrote on 2012-12-04: >>>>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote: >>>>>>>> Gleb Natapov wrote on 2012-12-03: >>>>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote: >>>>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts >>>>>>>>>> manually, which is fully taken care of by the hardware. This needs >>>>>>>>>> some special awareness into existing interrupr injection path: >>>>>>>>>> >>>>>>>>>> - for pending interrupt, instead of direct injection, we may need >>>>>>>>>> update architecture specific indicators before resuming to >>>>>>>>>> guest. - A pending interrupt, which is masked by ISR, should >>>>>>>>>> be also considered in above update action, since hardware >>>>>>>>>> will decide when to inject it at right time. Current >>>>>>>>>> has_interrupt and get_interrupt only returns a valid vector >>>>>>>>>> from injection p.o.v. >>>>>>>>> Most of my previous comments still apply. >>>>>>>>> >>>>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector, >>>>>>>>>> + int trig_mode, int always_set) >>>>>>>>>> +{ >>>>>>>>>> + if (kvm_x86_ops->set_eoi_exitmap) >>>>>>>>>> + kvm_x86_ops->set_eoi_exitmap(vcpu, vector, >>>>>>>>>> + trig_mode, always_set); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> /* >>>>>>>>>> * Add a pending IRQ into lapic. >>>>>>>>>> * Return 1 if successfully added and 0 if discarded. >>>>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic >>>>> *apic, >>>>>>> int >>>>>>>>> delivery_mode, >>>>>>>>>> if (unlikely(!apic_enabled(apic))) >>>>>>>>>> break; >>>>>>>>>> + kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0); >>>>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq >>>>>>>>> notifier configuration changes, user request bit to notify vcpus to >>>>>>>>> reload the bitmap. >>>>>>>> It is too complicated. When program ioapic entry, we cannot get the >>> target >>>>> vcpu >>>>>>> easily. We need to read destination format register and logical >>>>>>> destination register to find out target vcpu if using logical mode. >>>>>>> Also, we must trap every modification to the two registers to update >>>>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus >>>>>>> for the vector >>>>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot >>>>> ensure all vectors have same trigger mode. And what's worse, the >>>>> vector in another vcpu is used to handle high frequency >>>>> interrupts(like 10G NIC), then it will hurt performance. >>>>>> >>>>> I never saw OSes reuse vector used by ioapic, as far as I see this >>>> Could you point out which code does this check in Linux kernel? I don't >>>> see any special checks when Linux kernel allocates a vector. >>>> >>> arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses >>> apic->target_cpus() to get cpu mask. target_cpus() return mask of all >>> online cpus. Actually you wrote arch_pi_create_irq() in PI patches to >>> workaround this behaviour and allocated vector per cpu, no? >> Yes, when create an new irq, it will allocate vector from all online cpus. But after > user changes the irq affinity, then the vector will reallocate with new cpumask. > And this will leave the vector available on other cpus. >> > Since during vector allocation all cpus are checked vector will not be > reused if it is allocated on any cpu. Sorry, I still cannot find this check in kernel. Can you point me out to it? Also, I do see the vector reused by MSI and IOAPIC in my system. >>> Are you aware of any guest that I can run, examine ioapic/apic >>> configuration and see that the same vector is used on different vcpus >>> for different devices? Can you point me to it? >>> > Can you answer this? I am sure Xen will reused the IOAPIC vector. >>>>> is not how Linux code works. Furthermore it will not work with KVM >>>>> currently since apic eoi redirected to ioapic based on vector alone, >>>>> not vector/vcpu pair and as far as I am aware this is how real HW works. >>>> yes, real HW works in this way. But why it is helpful in this case? >>> It makes it impossible to use the same vector for different devices on >>> different cpus if the vector is delivered to at least one cpu through >>> ioapic. It may cause spurious interrupts, it will bring havoc to our >>> ack notifiers (albeit this is KVM's implementation problem). Also look >>> at various comment in arch/x86/kernel/apic/io_apic.c, it looks like >>> ioapics tend to misbehave if you look at them funny. Who knows what >>> troubles EOIing the same vector twice on real HW may bring. >>> >>>> >>>>>>> programmed into ioapic. Which two registers? All accesses to ioapic are >>>>>>> trapped and reconfiguration is rare. >>>>>> In logical mode, the destination VCPU is depend on each CPU's > destination >>>>> format register and logical destination register. So we must also >>>>> trap the two registers. >>>>>> And if it uses lowest priority delivery mode, the PPR need to be trapped > too. >>>>> Since PPR will change on each interrupt injection, the cost should be >>>>> higher than current approach. No need for all of that if bitmask it >>>>> global. >>>> No, the bitmask is per VCPU. Also, why it will work if bitmask is global? >>> Make in global. Why what will work? >>> >>> And we need to trap format/logical destination/id registers anyway since >>> we need to build kvm->arch.apic_map table that is used to deliver >>> interrupts. BTW you can use this table to build per VCPU eoi bitmask >>> too, but I am not convinced it is needed in practice. >> Even KVM uses a simple way to implement the lowest priority delivery mode, > we still need to trap all interrupts that use the lowest priority delivery mode. > Because each interrupt will change CPU's priority and we need to recalculate the > priority and iterate the whole ioapic entry to renew the eoi exiting bitmap. The > cost should be worse than current way. I don't think it worth to do. >> > Just set the bit on every vcpu that can get interrupt with lowest priority. > > -- > Gleb. Best regards, Yang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html