Re: [PATCH v3] arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

Christoffer Dall <christoffer.dall@xxxxxxxxxx> · Wed, 26 Nov 2014 10:16:18 +0100



On Wed, Nov 26, 2014 at 04:31:44PM +0800, Shannon Zhao wrote:
> On 2014/11/26 15:24, Christoffer Dall wrote:
> > On Wed, Nov 26, 2014 at 02:13:32PM +0800, Shannon Zhao wrote:
> >> On 2014/11/25 19:55, Marc Zyngier wrote:
> >>> On 25/11/14 11:49, Christoffer Dall wrote:
> >>>> On Tue, Nov 25, 2014 at 11:24:43AM +0000, Marc Zyngier wrote:
> >>>>> On 25/11/14 11:11, Christoffer Dall wrote:
> >>>>>> On Tue, Nov 25, 2014 at 10:54:18AM +0800, Shannon Zhao wrote:
> >>>>>>> On 2014/11/24 18:53, Christoffer Dall wrote:
> >>>>>>>> On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote:
> >>>>>>>>> Hi Marc, Christoffer,
> >>>>>>>>>
> >>>>>>>>> On 2014/11/23 4:04, Christoffer Dall wrote:
> >>>>>>>>>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote:
> >>>>>>>>>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which
> >>>>>>>>>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just
> >>>>>>>>>>> kick this vcpu to avoid iterating through all.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx>
> >>>>>>>>>>
> >>>>>>>>>> This looks reasonable to me:
> >>>>>>>>>>
> >>>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
> >>>>>>>>>>
> >>>>>>>>>> But as Marc said, we have to consider the churn by introducing more
> >>>>>>>>>> changes to the vgic (that file is being hammered pretty intensely
> >>>>>>>>>> these days), so if you feel this is an urgent optimization, it would
> >>>>>>>>>> be useful to see some data backing this up.
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU.
> >>>>>>>>> I just test the cycles of SPI using virtio-net.
> >>>>>>>>> Test steps:
> >>>>>>>>> 1) start a VM with 8 VCPUs
> >>>>>>>>> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The test shows:
> >>>>>>>>> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200).
> >>>>>>>>> From this test, I think this patch can bring some improvements.
> >>>>>>>>
> >>>>>>>> Are these averaged numbers?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes:-)
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> The test code like below. As it's almost no difference about vgic_update_irq_state
> >>>>>>>>> between with and without this patch. So just measure the kick's cycles.
> >>>>>>>>>
> >>>>>>>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>>>>>>>>                         bool level)
> >>>>>>>>> {
> >>>>>>>>>         unsigned long cycles_1,cycles_2;
> >>>>>>>>>         if (likely(vgic_initialized(kvm)) &&
> >>>>>>>>>             vgic_update_irq_pending(kvm, cpuid, irq_num, level)) {
> >>>>>>>>>                 start_pmu();
> >>>>>>>>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1));
> >>>>>>>>>                 vgic_kick_vcpus(kvm);
> >>>>>>>>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2));
> >>>>>>>>>         }
> >>>>>>>>>
> >>>>>>>>>         return 0;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>>>>>>>>                         bool level)
> >>>>>>>>> {
> >>>>>>>>>         int vcpu_id;
> >>>>>>>>>         unsigned long cycles_a,cycles_b;
> >>>>>>>>>         if (likely(vgic_initialized(kvm))) {
> >>>>>>>>>                 vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
> >>>>>>>>>                 if (vcpu_id >= 0) {
> >>>>>>>>>                         start_pmu();
> >>>>>>>>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a));
> >>>>>>>>>                         /* kick the specified vcpu */
> >>>>>>>>>                         kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id));
> >>>>>>>>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b));
> >>>>>>>>>                 }
> >>>>>>>>>         }
> >>>>>>>>>         return 0;
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Can you run some IPI-intensive benchmark in your guest and let us know
> >>>>>>>> if you see improvements on that level?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest?
> >>>>>>>
> >>>>>>
> >>>>>> Hackbench with processes sure seems to like IPIs.
> >>>>>
> >>>>> But that'd mostly be IPIs in the guest, and we're hoping for that patch
> >>>>> to result in a reduction in the number of IPIs on the host when
> >>>>> interrupts are injected.
> >>>>
> >>>> Ah right, I remembered the SGI handling register calling
> >>>> kvm_vgic_inject_irq(), but it doesn't.
> >>>>
> >>>>>
> >>>>> I guess that having a workload that generates many interrupts on a SMP
> >>>>> guest should result in a reduction of the number of IPIs on the host.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>> That's sort of what Shannon did already, only we need to measure a drop
> >>>> in overall cpu utilization on the host instead or look at iperf numbers
> >>>> or something like that. Right?
> >>>
> >>> Yes. I guess that vmstat running in the background on the host should
> >>> give a good indication of what is going on.
> >>>
> >>
> >> I just measure the overall cpu utilization on the host using vmstat and iperf.
> >> I start a VM with 8 vcpus and use iperf to send packets from host to guest.
> >> Bind the interrupt of virtio to cpu0 and cpu7.
> >>
> >> The result is following:
> >>
> >> Without this patch:
> >> Bind to cpu0:
> >> 	Bandwidth : 6.60 Gbits/sec
> >> 	vmstat data:
> >> 	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> 	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >> 	 2  0      0 7795456      0 120568    0    0     0     0 8967 11405  2  2 96  0
> >> Bind to cpu7:
> >> 	Bandwidth : 6.13 Gbits/sec
> >> 	vmstat data:
> >> 	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> 	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >> 	 1  0      0 7795016      0 120572    0    0     0     0 14633 20710  2  3 95  0
> >>
> >> With this patch:
> >> Bind to cpu0:
> >> 	Bandwidth : 6.99 Gbits/sec
> >> 	vmstat data:
> >> 	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> 	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >> 	 1  0      0 7788048      0 124836    0    0     0     0 10149 11593  2  2 96  0
> >> Bind to cpu7:
> >> 	Bandwidth : 6.53 Gbits/sec
> >> 	vmstat data:
> >> 	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
> >> 	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> >> 	 1  0      0 7791044      0 124832    0    0     0     0 11408 14179  2  2 96  0
> >>
> >> From the data, it has some improvement :-)
> >>
> > Indeed things are promising, but I have some questions:
> > 
> >  - Are these numbers consistent (average over how many runs, what's the
> >    variance?)
> >  - Why the seemlingly consistent difference between binding to cpu0 and
> >    cpu7?
> 
> The vmstat command line is "vmstat 1 1000". And I just paste the common numbers not the average.
> The difference between with and without this patch when binding to cpu7 is obvious. But the
> difference of binding to cpu0 is not.
> 
> >  - what were the iperf parameters?
> 
> I forgot to paste it.
> ./iperf  -c 192.168.0.2 -P 1 -p 5001 -f G -t 60
> 
> >  - the number of context switches going down for cpu7 from with/without
> >    the patch is interesting, but the fact that we don't see this for
> >    cpu0 makes me doubt that it's meaningful.
> > 
> 
> Ah, Sorry that I don't get the average numbers. With/without this patch the number of context
> switches for binding to cpu0 is almost no difference. The numbers are at the same level.
> 
> > Based purely on the bandwidth, and assuming those numbers are
> > consistent, I would say we should go ahead and merge this.
> > 
> 
> Thanks. The numbers of bandwidth are average and consistent.
> 
So I would say "ship it", Marc?

-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm