Re: [PATCH v3] arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

Christoffer Dall <christoffer.dall@xxxxxxxxxx> · Tue, 25 Nov 2014 12:11:11 +0100

On Tue, Nov 25, 2014 at 10:54:18AM +0800, Shannon Zhao wrote:
> On 2014/11/24 18:53, Christoffer Dall wrote:
> > On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote:
> >> Hi Marc, Christoffer,
> >>
> >> On 2014/11/23 4:04, Christoffer Dall wrote:
> >>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote:
> >>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which
> >>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just
> >>>> kick this vcpu to avoid iterating through all.
> >>>>
> >>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx>
> >>>
> >>> This looks reasonable to me:
> >>>
> >>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
> >>>
> >>> But as Marc said, we have to consider the churn by introducing more
> >>> changes to the vgic (that file is being hammered pretty intensely
> >>> these days), so if you feel this is an urgent optimization, it would
> >>> be useful to see some data backing this up.
> >>>
> >>
> >> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU.
> >> I just test the cycles of SPI using virtio-net.
> >> Test steps:
> >> 1) start a VM with 8 VCPUs
> >> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles
> >>
> >>
> >> The test shows:
> >> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200).
> >> From this test, I think this patch can bring some improvements.
> > 
> > Are these averaged numbers?
> >
> 
> Yes:-)
> 
> >>
> >> The test code like below. As it's almost no difference about vgic_update_irq_state
> >> between with and without this patch. So just measure the kick's cycles.
> >>
> >> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>                         bool level)
> >> {
> >>         unsigned long cycles_1,cycles_2;
> >>         if (likely(vgic_initialized(kvm)) &&
> >>             vgic_update_irq_pending(kvm, cpuid, irq_num, level)) {
> >>                 start_pmu();
> >>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1));
> >>                 vgic_kick_vcpus(kvm);
> >>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2));
> >>         }
> >>
> >>         return 0;
> >> }
> >>
> >> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>                         bool level)
> >> {
> >>         int vcpu_id;
> >>         unsigned long cycles_a,cycles_b;
> >>         if (likely(vgic_initialized(kvm))) {
> >>                 vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
> >>                 if (vcpu_id >= 0) {
> >>                         start_pmu();
> >>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a));
> >>                         /* kick the specified vcpu */
> >>                         kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id));
> >>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b));
> >>                 }
> >>         }
> >>         return 0;
> >> }
> >>
> > 
> > Can you run some IPI-intensive benchmark in your guest and let us know
> > if you see improvements on that level?
> > 
> 
> Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest?
> 

Hackbench with processes sure seems to like IPIs.

> > Not trying to be overly-pedantic here (I think your numbers suggest we
> > should merge this), but if the case you're optimizing doesn't happen
> > very often, we may not see this on a guest level or overall CPU
> > utilization level, and it would be very interesting to know.
> > 
> 
> Yeah, it would be interesting.
> 

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm