On Wed, Nov 26, 2014 at 04:31:44PM +0800, Shannon Zhao wrote: > On 2014/11/26 15:24, Christoffer Dall wrote: > > On Wed, Nov 26, 2014 at 02:13:32PM +0800, Shannon Zhao wrote: > >> On 2014/11/25 19:55, Marc Zyngier wrote: > >>> On 25/11/14 11:49, Christoffer Dall wrote: > >>>> On Tue, Nov 25, 2014 at 11:24:43AM +0000, Marc Zyngier wrote: > >>>>> On 25/11/14 11:11, Christoffer Dall wrote: > >>>>>> On Tue, Nov 25, 2014 at 10:54:18AM +0800, Shannon Zhao wrote: > >>>>>>> On 2014/11/24 18:53, Christoffer Dall wrote: > >>>>>>>> On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote: > >>>>>>>>> Hi Marc, Christoffer, > >>>>>>>>> > >>>>>>>>> On 2014/11/23 4:04, Christoffer Dall wrote: > >>>>>>>>>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote: > >>>>>>>>>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which > >>>>>>>>>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just > >>>>>>>>>>> kick this vcpu to avoid iterating through all. > >>>>>>>>>>> > >>>>>>>>>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx> > >>>>>>>>>> > >>>>>>>>>> This looks reasonable to me: > >>>>>>>>>> > >>>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> > >>>>>>>>>> > >>>>>>>>>> But as Marc said, we have to consider the churn by introducing more > >>>>>>>>>> changes to the vgic (that file is being hammered pretty intensely > >>>>>>>>>> these days), so if you feel this is an urgent optimization, it would > >>>>>>>>>> be useful to see some data backing this up. > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU. > >>>>>>>>> I just test the cycles of SPI using virtio-net. > >>>>>>>>> Test steps: > >>>>>>>>> 1) start a VM with 8 VCPUs > >>>>>>>>> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> The test shows: > >>>>>>>>> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200). > >>>>>>>>> From this test, I think this patch can bring some improvements. > >>>>>>>> > >>>>>>>> Are these averaged numbers? > >>>>>>>> > >>>>>>> > >>>>>>> Yes:-) > >>>>>>> > >>>>>>>>> > >>>>>>>>> The test code like below. As it's almost no difference about vgic_update_irq_state > >>>>>>>>> between with and without this patch. So just measure the kick's cycles. > >>>>>>>>> > >>>>>>>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, > >>>>>>>>> bool level) > >>>>>>>>> { > >>>>>>>>> unsigned long cycles_1,cycles_2; > >>>>>>>>> if (likely(vgic_initialized(kvm)) && > >>>>>>>>> vgic_update_irq_pending(kvm, cpuid, irq_num, level)) { > >>>>>>>>> start_pmu(); > >>>>>>>>> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1)); > >>>>>>>>> vgic_kick_vcpus(kvm); > >>>>>>>>> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2)); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, > >>>>>>>>> bool level) > >>>>>>>>> { > >>>>>>>>> int vcpu_id; > >>>>>>>>> unsigned long cycles_a,cycles_b; > >>>>>>>>> if (likely(vgic_initialized(kvm))) { > >>>>>>>>> vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level); > >>>>>>>>> if (vcpu_id >= 0) { > >>>>>>>>> start_pmu(); > >>>>>>>>> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a)); > >>>>>>>>> /* kick the specified vcpu */ > >>>>>>>>> kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id)); > >>>>>>>>> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b)); > >>>>>>>>> } > >>>>>>>>> } > >>>>>>>>> return 0; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>> > >>>>>>>> Can you run some IPI-intensive benchmark in your guest and let us know > >>>>>>>> if you see improvements on that level? > >>>>>>>> > >>>>>>> > >>>>>>> Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest? > >>>>>>> > >>>>>> > >>>>>> Hackbench with processes sure seems to like IPIs. > >>>>> > >>>>> But that'd mostly be IPIs in the guest, and we're hoping for that patch > >>>>> to result in a reduction in the number of IPIs on the host when > >>>>> interrupts are injected. > >>>> > >>>> Ah right, I remembered the SGI handling register calling > >>>> kvm_vgic_inject_irq(), but it doesn't. > >>>> > >>>>> > >>>>> I guess that having a workload that generates many interrupts on a SMP > >>>>> guest should result in a reduction of the number of IPIs on the host. > >>>>> > >>>>> What do you think? > >>>>> > >>>> That's sort of what Shannon did already, only we need to measure a drop > >>>> in overall cpu utilization on the host instead or look at iperf numbers > >>>> or something like that. Right? > >>> > >>> Yes. I guess that vmstat running in the background on the host should > >>> give a good indication of what is going on. > >>> > >> > >> I just measure the overall cpu utilization on the host using vmstat and iperf. > >> I start a VM with 8 vcpus and use iperf to send packets from host to guest. > >> Bind the interrupt of virtio to cpu0 and cpu7. > >> > >> The result is following: > >> > >> Without this patch: > >> Bind to cpu0: > >> Bandwidth : 6.60 Gbits/sec > >> vmstat data: > >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > >> r b swpd free buff cache si so bi bo in cs us sy id wa > >> 2 0 0 7795456 0 120568 0 0 0 0 8967 11405 2 2 96 0 > >> Bind to cpu7: > >> Bandwidth : 6.13 Gbits/sec > >> vmstat data: > >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > >> r b swpd free buff cache si so bi bo in cs us sy id wa > >> 1 0 0 7795016 0 120572 0 0 0 0 14633 20710 2 3 95 0 > >> > >> With this patch: > >> Bind to cpu0: > >> Bandwidth : 6.99 Gbits/sec > >> vmstat data: > >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > >> r b swpd free buff cache si so bi bo in cs us sy id wa > >> 1 0 0 7788048 0 124836 0 0 0 0 10149 11593 2 2 96 0 > >> Bind to cpu7: > >> Bandwidth : 6.53 Gbits/sec > >> vmstat data: > >> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- > >> r b swpd free buff cache si so bi bo in cs us sy id wa > >> 1 0 0 7791044 0 124832 0 0 0 0 11408 14179 2 2 96 0 > >> > >> From the data, it has some improvement :-) > >> > > Indeed things are promising, but I have some questions: > > > > - Are these numbers consistent (average over how many runs, what's the > > variance?) > > - Why the seemlingly consistent difference between binding to cpu0 and > > cpu7? > > The vmstat command line is "vmstat 1 1000". And I just paste the common numbers not the average. > The difference between with and without this patch when binding to cpu7 is obvious. But the > difference of binding to cpu0 is not. > > > - what were the iperf parameters? > > I forgot to paste it. > ./iperf -c 192.168.0.2 -P 1 -p 5001 -f G -t 60 > > > - the number of context switches going down for cpu7 from with/without > > the patch is interesting, but the fact that we don't see this for > > cpu0 makes me doubt that it's meaningful. > > > > Ah, Sorry that I don't get the average numbers. With/without this patch the number of context > switches for binding to cpu0 is almost no difference. The numbers are at the same level. > > > Based purely on the bandwidth, and assuming those numbers are > > consistent, I would say we should go ahead and merge this. > > > > Thanks. The numbers of bandwidth are average and consistent. > So I would say "ship it", Marc? -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm