Re: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support

Gleb Natapov <gleb@xxxxxxxxxx> · Wed, 5 Dec 2012 16:00:23 +0200



On Wed, Dec 05, 2012 at 01:51:36PM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-12-05:
> > On Wed, Dec 05, 2012 at 06:02:59AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-12-05:
> >>> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
> >>>> Gleb Natapov wrote on 2012-12-04:
> >>>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
> >>>>>> Gleb Natapov wrote on 2012-12-03:
> >>>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
> >>>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
> >>>>>>>> manually, which is fully taken care of by the hardware. This needs
> >>>>>>>> some special awareness into existing interrupr injection path:
> >>>>>>>> 
> >>>>>>>> - for pending interrupt, instead of direct injection, we may need
> >>>>>>>>   update architecture specific indicators before resuming to guest. -
> >>>>>>>>   A pending interrupt, which is masked by ISR, should be also
> >>>>>>>>   considered in above update action, since hardware will decide when
> >>>>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
> >>>>>>>>   only returns a valid vector from injection p.o.v.
> >>>>>>> Most of my previous comments still apply.
> >>>>>>> 
> >>>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
> >>>>>>>> +		int trig_mode, int always_set)
> >>>>>>>> +{
> >>>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
> >>>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
> >>>>>>>> +					trig_mode, always_set);
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>>  /*
> >>>>>>>>   * Add a pending IRQ into lapic.
> >>>>>>>>   * Return 1 if successfully added and 0 if discarded.
> >>>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> >>> *apic,
> >>>>> int
> >>>>>>> delivery_mode,
> >>>>>>>>  		if (unlikely(!apic_enabled(apic)))
> >>>>>>>>  			break;
> >>>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
> >>>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
> >>>>>>> notifier configuration changes, user request bit to notify vcpus to
> >>>>>>> reload the bitmap.
> >>>>>> It is too complicated. When program ioapic entry, we cannot get the
> > target
> >>> vcpu
> >>>>> easily. We need to read destination format register and logical
> >>>>> destination register to find out target vcpu if using logical mode.
> >>>>> Also, we must trap every modification to the two registers to update
> >>>>> eoi bitmap. No need to check target vcpu. Enable exit on all vcpus
> >>>>> for the vector
> >>>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> >>> ensure all vectors have same trigger mode. And what's worse, the
> >>> vector in another vcpu is used to handle high frequency
> >>> interrupts(like 10G NIC), then it will hurt performance.
> >>>> 
> >>> I never saw OSes reuse vector used by ioapic, as far as I see this
> >> Could you point out which code does this check in Linux kernel? I don't
> >> see any special checks when Linux kernel allocates a vector.
> >> 
> > arch/x86/kernel/apic/io_apic.c:create_irq_nr(). It uses
> > apic->target_cpus() to get cpu mask. target_cpus() return mask of all
> > online cpus. Actually you wrote arch_pi_create_irq() in PI patches to
> > workaround this behaviour and allocated vector per cpu, no?
> Yes, when create an new irq, it will allocate vector from all online cpus. But after user changes the irq affinity, then the vector will reallocate with new cpumask. And this will leave the vector available on other cpus. 
>  
Since during vector allocation all cpus are checked vector will not be
reused if it is allocated on any cpu.

> > Are you aware of any guest that I can run, examine ioapic/apic
> > configuration and see that the same vector is used on different vcpus
> > for different devices? Can you point me to it?
> > 
Can you answer this?

> >>> is not how Linux code works. Furthermore it will not work with KVM
> >>> currently since apic eoi redirected to ioapic based on vector alone,
> >>> not vector/vcpu pair and as far as I am aware this is how real HW works.
> >> yes, real HW works in this way. But why it is helpful in this case?
> > It makes it impossible to use the same vector for different devices on
> > different cpus if the vector is delivered to at least one cpu through ioapic.
> > It may cause spurious interrupts, it will bring havoc to our ack
> > notifiers (albeit this is KVM's implementation problem). Also look at
> > various comment in arch/x86/kernel/apic/io_apic.c, it looks like ioapics
> > tend to misbehave if you look at them funny. Who knows what troubles EOIing
> > the same vector twice on real HW may bring.
> > 
> >> 
> >>>>> programmed into ioapic. Which two registers? All accesses to ioapic are
> >>>>> trapped and reconfiguration is rare.
> >>>> In logical mode, the destination VCPU is depend on each CPU's destination
> >>> format register and logical destination register. So we must also trap the two
> >>> registers.
> >>>> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
> >>> Since PPR will change on each interrupt injection, the cost should be
> >>> higher than current approach. No need for all of that if bitmask it
> >>> global.
> >> No, the bitmask is per VCPU. Also, why it will work if bitmask is global?
> > Make in global. Why what will work?
> > 
> > And we need to trap format/logical destination/id registers anyway since
> > we need to build kvm->arch.apic_map table that is used to deliver
> > interrupts. BTW you can use this table to build per VCPU eoi bitmask
> > too, but I am not convinced it is needed in practice.
> Even KVM uses a simple way to implement the lowest priority delivery mode, we still need to trap all interrupts that use the lowest priority delivery mode. Because each interrupt will change CPU's priority and we need to recalculate the priority and iterate the whole ioapic entry to renew the eoi exiting bitmap. The cost should be worse than current way. I don't think it worth to do.
> 
Just set the bit on every vcpu that can get interrupt with lowest priority.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html