Re: [PATCH v3] arm/arm64: KVM: vgic: kick the specific vcpu instead of iterating through all

Christoffer Dall <christoffer.dall@xxxxxxxxxx> · Mon, 24 Nov 2014 11:53:09 +0100

On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote:
> Hi Marc, Christoffer,
> 
> On 2014/11/23 4:04, Christoffer Dall wrote:
> > On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote:
> >> When call kvm_vgic_inject_irq to inject interrupt, we can known which
> >> vcpu the interrupt for by the irq_num and the cpuid. So we should just
> >> kick this vcpu to avoid iterating through all.
> >>
> >> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx>
> > 
> > This looks reasonable to me:
> > 
> > Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx>
> > 
> > But as Marc said, we have to consider the churn by introducing more
> > changes to the vgic (that file is being hammered pretty intensely
> > these days), so if you feel this is an urgent optimization, it would
> > be useful to see some data backing this up.
> > 
> 
> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU.
> I just test the cycles of SPI using virtio-net.
> Test steps:
> 1) start a VM with 8 VCPUs
> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles
> 
> 
> The test shows:
> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200).
> From this test, I think this patch can bring some improvements.

Are these averaged numbers?

> 
> The test code like below. As it's almost no difference about vgic_update_irq_state
> between with and without this patch. So just measure the kick's cycles.
> 
> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>                         bool level)
> {
>         unsigned long cycles_1,cycles_2;
>         if (likely(vgic_initialized(kvm)) &&
>             vgic_update_irq_pending(kvm, cpuid, irq_num, level)) {
>                 start_pmu();
>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1));
>                 vgic_kick_vcpus(kvm);
>                 __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2));
>         }
> 
>         return 0;
> }
> 
> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>                         bool level)
> {
>         int vcpu_id;
>         unsigned long cycles_a,cycles_b;
>         if (likely(vgic_initialized(kvm))) {
>                 vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level);
>                 if (vcpu_id >= 0) {
>                         start_pmu();
>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a));
>                         /* kick the specified vcpu */
>                         kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id));
>                         __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b));
>                 }
>         }
>         return 0;
> }
> 

Can you run some IPI-intensive benchmark in your guest and let us know
if you see improvements on that level?

Not trying to be overly-pedantic here (I think your numbers suggest we
should merge this), but if the case you're optimizing doesn't happen
very often, we may not see this on a guest level or overall CPU
utilization level, and it would be very interesting to know.

Thanks!
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm