Re: [PATCH v3] arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

Shannon Zhao <zhaoshenglong@xxxxxxxxxx> · Tue, 25 Nov 2014 10:54:18 +0800

On 2014/11/24 18:53, Christoffer Dall wrote:
> On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote:
>> Hi Marc, Christoffer,
>>
>> On 2014/11/23 4:04, Christoffer Dall wrote:
>>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote:
>>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which
>>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just
>>>> kick this vcpu to avoid iterating through all.
>>>>
>>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx>
>>>
>>> This looks reasonable to me:
>>>
>>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
>>>
>>> But as Marc said, we have to consider the churn by introducing more
>>> changes to the vgic (that file is being hammered pretty intensely
>>> these days), so if you feel this is an urgent optimization, it would
>>> be useful to see some data backing this up.
>>>
>>
>> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU.
>> I just test the cycles of SPI using virtio-net.
>> Test steps:
>> 1) start a VM with 8 VCPUs
>> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles
>>
>>
>> The test shows:
>> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200).
>> From this test, I think this patch can bring some improvements.
> 
> Are these averaged numbers?
>

Yes:-)

>>
>> The test code like below. As it's almost no difference about vgic_update_irq_state
>> between with and without this patch. So just measure the kick's cycles.
>>
>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>                         bool level)
>> {
>>         unsigned long cycles_1,cycles_2;
>>         if (likely(vgic_initialized(kvm)) &&
>>             vgic_update_irq_pending(kvm, cpuid, irq_num, level)) {
>>                 start_pmu();
>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1));
>>                 vgic_kick_vcpus(kvm);
>>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2));
>>         }
>>
>>         return 0;
>> }
>>
>> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>                         bool level)
>> {
>>         int vcpu_id;
>>         unsigned long cycles_a,cycles_b;
>>         if (likely(vgic_initialized(kvm))) {
>>                 vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
>>                 if (vcpu_id >= 0) {
>>                         start_pmu();
>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a));
>>                         /* kick the specified vcpu */
>>                         kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id));
>>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b));
>>                 }
>>         }
>>         return 0;
>> }
>>
> 
> Can you run some IPI-intensive benchmark in your guest and let us know
> if you see improvements on that level?
> 

Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest?

> Not trying to be overly-pedantic here (I think your numbers suggest we
> should merge this), but if the case you're optimizing doesn't happen
> very often, we may not see this on a guest level or overall CPU
> utilization level, and it would be very interesting to know.
> 

Yeah, it would be interesting.

Thanks,
Shannon

_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm