Re: [PATCH v3] arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

Shannon Zhao <zhaoshenglong@xxxxxxxxxx> · Wed, 26 Nov 2014 14:13:32 +0800

On 2014/11/25 19:55, Marc Zyngier wrote:
> On 25/11/14 11:49, Christoffer Dall wrote:
>> On Tue, Nov 25, 2014 at 11:24:43AM +0000, Marc Zyngier wrote:
>>> On 25/11/14 11:11, Christoffer Dall wrote:
>>>> On Tue, Nov 25, 2014 at 10:54:18AM +0800, Shannon Zhao wrote:
>>>>> On 2014/11/24 18:53, Christoffer Dall wrote:
>>>>>> On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote:
>>>>>>> Hi Marc, Christoffer,
>>>>>>>
>>>>>>> On 2014/11/23 4:04, Christoffer Dall wrote:
>>>>>>>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote:
>>>>>>>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which
>>>>>>>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just
>>>>>>>>> kick this vcpu to avoid iterating through all.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx>
>>>>>>>>
>>>>>>>> This looks reasonable to me:
>>>>>>>>
>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
>>>>>>>>
>>>>>>>> But as Marc said, we have to consider the churn by introducing more
>>>>>>>> changes to the vgic (that file is being hammered pretty intensely
>>>>>>>> these days), so if you feel this is an urgent optimization, it would
>>>>>>>> be useful to see some data backing this up.
>>>>>>>>
>>>>>>>
>>>>>>> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU.
>>>>>>> I just test the cycles of SPI using virtio-net.
>>>>>>> Test steps:
>>>>>>> 1) start a VM with 8 VCPUs
>>>>>>> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles
>>>>>>>
>>>>>>>
>>>>>>> The test shows:
>>>>>>> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200).
>>>>>>> From this test, I think this patch can bring some improvements.
>>>>>>
>>>>>> Are these averaged numbers?
>>>>>>
>>>>>
>>>>> Yes:-)
>>>>>
>>>>>>>
>>>>>>> The test code like below. As it's almost no difference about vgic_update_irq_state
>>>>>>> between with and without this patch. So just measure the kick's cycles.
>>>>>>>
>>>>>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>>>>>>                         bool level)
>>>>>>> {
>>>>>>>         unsigned long cycles_1,cycles_2;
>>>>>>>         if (likely(vgic_initialized(kvm)) &&
>>>>>>>             vgic_update_irq_pending(kvm, cpuid, irq_num, level)) {
>>>>>>>                 start_pmu();
>>>>>>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1));
>>>>>>>                 vgic_kick_vcpus(kvm);
>>>>>>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2));
>>>>>>>         }
>>>>>>>
>>>>>>>         return 0;
>>>>>>> }
>>>>>>>
>>>>>>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>>>>>>                         bool level)
>>>>>>> {
>>>>>>>         int vcpu_id;
>>>>>>>         unsigned long cycles_a,cycles_b;
>>>>>>>         if (likely(vgic_initialized(kvm))) {
>>>>>>>                 vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
>>>>>>>                 if (vcpu_id >= 0) {
>>>>>>>                         start_pmu();
>>>>>>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a));
>>>>>>>                         /* kick the specified vcpu */
>>>>>>>                         kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id));
>>>>>>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b));
>>>>>>>                 }
>>>>>>>         }
>>>>>>>         return 0;
>>>>>>> }
>>>>>>>
>>>>>>
>>>>>> Can you run some IPI-intensive benchmark in your guest and let us know
>>>>>> if you see improvements on that level?
>>>>>>
>>>>>
>>>>> Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest?
>>>>>
>>>>
>>>> Hackbench with processes sure seems to like IPIs.
>>>
>>> But that'd mostly be IPIs in the guest, and we're hoping for that patch
>>> to result in a reduction in the number of IPIs on the host when
>>> interrupts are injected.
>>
>> Ah right, I remembered the SGI handling register calling
>> kvm_vgic_inject_irq(), but it doesn't.
>>
>>>
>>> I guess that having a workload that generates many interrupts on a SMP
>>> guest should result in a reduction of the number of IPIs on the host.
>>>
>>> What do you think?
>>>
>> That's sort of what Shannon did already, only we need to measure a drop
>> in overall cpu utilization on the host instead or look at iperf numbers
>> or something like that. Right?
> 
> Yes. I guess that vmstat running in the background on the host should
> give a good indication of what is going on.
> 

I just measure the overall cpu utilization on the host using vmstat and iperf.
I start a VM with 8 vcpus and use iperf to send packets from host to guest.
Bind the interrupt of virtio to cpu0 and cpu7.

The result is following:

Without this patch:
Bind to cpu0:
	Bandwidth : 6.60 Gbits/sec
	vmstat data:
	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
	 2  0      0 7795456      0 120568    0    0     0     0 8967 11405  2  2 96  0
Bind to cpu7:
	Bandwidth : 6.13 Gbits/sec
	vmstat data:
	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
	 1  0      0 7795016      0 120572    0    0     0     0 14633 20710  2  3 95  0

With this patch:
Bind to cpu0:
	Bandwidth : 6.99 Gbits/sec
	vmstat data:
	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
	 1  0      0 7788048      0 124836    0    0     0     0 10149 11593  2  2 96  0
Bind to cpu7:
	Bandwidth : 6.53 Gbits/sec
	vmstat data:
	procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
	 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
	 1  0      0 7791044      0 124832    0    0     0     0 11408 14179  2  2 96  0

>From the data, it has some improvement :-)

Thanks,
Shannon

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm