On 2014/11/24 18:53, Christoffer Dall wrote: > On Mon, Nov 24, 2014 at 03:53:16PM +0800, Shannon Zhao wrote: >> Hi Marc, Christoffer, >> >> On 2014/11/23 4:04, Christoffer Dall wrote: >>> On Wed, Nov 19, 2014 at 06:11:25PM +0800, Shannon Zhao wrote: >>>> When call kvm_vgic_inject_irq to inject interrupt, we can known which >>>> vcpu the interrupt for by the irq_num and the cpuid. So we should just >>>> kick this vcpu to avoid iterating through all. >>>> >>>> Signed-off-by: Shannon Zhao <zhaoshenglong@xxxxxxxxxx> >>> >>> This looks reasonable to me: >>> >>> Reviewed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> >>> >>> But as Marc said, we have to consider the churn by introducing more >>> changes to the vgic (that file is being hammered pretty intensely >>> these days), so if you feel this is an urgent optimization, it would >>> be useful to see some data backing this up. >>> >> >> Today I have a test which measures the cycles about kvm_vgic_inject_irq by PMU. >> I just test the cycles of SPI using virtio-net. >> Test steps: >> 1) start a VM with 8 VCPUs >> 2) In guest bind the irq of virtio to CPU8, host ping VM, get the cycles >> >> >> The test shows: >> Without this patch, the cycles is about 3700(3300-5000), and with this patch, the cycles is about 3000(2500-3200). >> From this test, I think this patch can bring some improvements. > > Are these averaged numbers? > Yes:-) >> >> The test code like below. As it's almost no difference about vgic_update_irq_state >> between with and without this patch. So just measure the kick's cycles. >> >> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, >> bool level) >> { >> unsigned long cycles_1,cycles_2; >> if (likely(vgic_initialized(kvm)) && >> vgic_update_irq_pending(kvm, cpuid, irq_num, level)) { >> start_pmu(); >> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_1)); >> vgic_kick_vcpus(kvm); >> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_2)); >> } >> >> return 0; >> } >> >> int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num, >> bool level) >> { >> int vcpu_id; >> unsigned long cycles_a,cycles_b; >> if (likely(vgic_initialized(kvm))) { >> vcpu_id = vgic_update_irq_pending(kvm, cpuid, irq_num, level); >> if (vcpu_id >= 0) { >> start_pmu(); >> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_a)); >> /* kick the specified vcpu */ >> kvm_vcpu_kick(kvm_get_vcpu(kvm, vcpu_id)); >> __asm__ __volatile__("MRS %0, PMCCNTR_EL0" : "=r"(cycles_b)); >> } >> } >> return 0; >> } >> > > Can you run some IPI-intensive benchmark in your guest and let us know > if you see improvements on that level? > Cool, I'll try to find some benchmarks and run. Are there some IPI-intensive benchmarks you suggest? > Not trying to be overly-pedantic here (I think your numbers suggest we > should merge this), but if the case you're optimizing doesn't happen > very often, we may not see this on a guest level or overall CPU > utilization level, and it would be very interesting to know. > Yeah, it would be interesting. Thanks, Shannon _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm