Re: [PATCH v3] arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

Christoffer Dall <christoffer.dall@xxxxxxxxxx> · Tue, 25 Nov 2014 12:49:15 +0100

On Tue, Nov 25, 2014 at 11:24:43AM +0000, Marc Zyngier wrote:
> On 25/11/14 11:11, Christoffer Dall wrote:
> > On Tue, Nov 25, 2014 at 10:54:18AM +0800, Shannon Zhao wrote:
> >> On 2014/11/24 18:53, Christoffer Dall wrote:
> >>> On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote:
> >>>> Hi Marc, Christoffer,
> >>>>
> >>>> On 2014/11/23 4:04, Christoffer Dall wrote:
> >>>>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote:
> >>>>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which
> >>>>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just
> >>>>>> kick this vcpu to avoid iterating through all.
> >>>>>>
> >>>>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx>
> >>>>>
> >>>>> This looks reasonable to me:
> >>>>>
> >>>>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
> >>>>>
> >>>>> But as Marc said, we have to consider the churn by introducing more
> >>>>> changes to the vgic (that file is being hammered pretty intensely
> >>>>> these days), so if you feel this is an urgent optimization, it would
> >>>>> be useful to see some data backing this up.
> >>>>>
> >>>>
> >>>> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU.
> >>>> I just test the cycles of SPI using virtio-net.
> >>>> Test steps:
> >>>> 1) start a VM with 8 VCPUs
> >>>> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles
> >>>>
> >>>>
> >>>> The test shows:
> >>>> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200).
> >>>> From this test, I think this patch can bring some improvements.
> >>>
> >>> Are these averaged numbers?
> >>>
> >>
> >> Yes:-)
> >>
> >>>>
> >>>> The test code like below. As it's almost no difference about vgic_update_irq_state
> >>>> between with and without this patch. So just measure the kick's cycles.
> >>>>
> >>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>>>                         bool level)
> >>>> {
> >>>>         unsigned long cycles_1,cycles_2;
> >>>>         if (likely(vgic_initialized(kvm)) &&
> >>>>             vgic_update_irq_pending(kvm, cpuid, irq_num, level)) {
> >>>>                 start_pmu();
> >>>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1));
> >>>>                 vgic_kick_vcpus(kvm);
> >>>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2));
> >>>>         }
> >>>>
> >>>>         return 0;
> >>>> }
> >>>>
> >>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>>>                         bool level)
> >>>> {
> >>>>         int vcpu_id;
> >>>>         unsigned long cycles_a,cycles_b;
> >>>>         if (likely(vgic_initialized(kvm))) {
> >>>>                 vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
> >>>>                 if (vcpu_id >= 0) {
> >>>>                         start_pmu();
> >>>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a));
> >>>>                         /* kick the specified vcpu */
> >>>>                         kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id));
> >>>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b));
> >>>>                 }
> >>>>         }
> >>>>         return 0;
> >>>> }
> >>>>
> >>>
> >>> Can you run some IPI-intensive benchmark in your guest and let us know
> >>> if you see improvements on that level?
> >>>
> >>
> >> Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest?
> >>
> > 
> > Hackbench with processes sure seems to like IPIs.
> 
> But that'd mostly be IPIs in the guest, and we're hoping for that patch
> to result in a reduction in the number of IPIs on the host when
> interrupts are injected.

Ah right, I remembered the SGI handling register calling
kvm_vgic_inject_irq(), but it doesn't.

> 
> I guess that having a workload that generates many interrupts on a SMP
> guest should result in a reduction of the number of IPIs on the host.
> 
> What do you think?
> 
That's sort of what Shannon did already, only we need to measure a drop
in overall cpu utilization on the host instead or look at iperf numbers
or something like that. Right?

-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm