On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote: > Hi Marc, Christoffer, > > On 2014/11/23 4:04, Christoffer Dall wrote: > > On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote: > >> When call kvm_vgic_inject_irq to inject interrupt, we can known which > >> vcpu the interrupt for by the irq_num and the cpuid. So we should just > >> kick this vcpu to avoid iterating through all. > >> > >> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx> > > > > This looks reasonable to me: > > > > Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> > > > > But as Marc said, we have to consider the churn by introducing more > > changes to the vgic (that file is being hammered pretty intensely > > these days), so if you feel this is an urgent optimization, it would > > be useful to see some data backing this up. > > > > Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU. > I just test the cycles of SPI using virtio-net. > Test steps: > 1) start a VM with 8 VCPUs > 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles > > > The test shows: > Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200). > From this test, I think this patch can bring some improvements. Are these averaged numbers? > > The test code like below. As it's almost no difference about vgic_update_irq_state > between with and without this patch. So just measure the kick's cycles. > > int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, > bool level) > { > unsigned long cycles_1,cycles_2; > if (likely(vgic_initialized(kvm)) && > vgic_update_irq_pending(kvm, cpuid, irq_num, level)) { > start_pmu(); > __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1)); > vgic_kick_vcpus(kvm); > __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2)); > } > > return 0; > } > > int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, > bool level) > { > int vcpu_id; > unsigned long cycles_a,cycles_b; > if (likely(vgic_initialized(kvm))) { > vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level); > if (vcpu_id >= 0) { > start_pmu(); > __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a)); > /* kick the specified vcpu */ > kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id)); > __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b)); > } > } > return 0; > } > Can you run some IPI-intensive benchmark in your guest and let us know if you see improvements on that level? Not trying to be overly-pedantic here (I think your numbers suggest we should merge this), but if the case you're optimizing doesn't happen very often, we may not see this on a guest level or overall CPU utilization level, and it would be very interesting to know. Thanks! -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm