On Mon, Mar 04, 2013 at 11:31:46PM +0530, Raghavendra K T wrote: > This patch series further filters better vcpu candidate to yield to > in PLE handler. The main idea is to record the preempted vcpus using > preempt notifiers and iterate only those preempted vcpus in the > handler. Note that the vcpus which were in spinloop during pause loop > exit are already filtered. The %improvement and patch series look good. > > Thanks Jiannan, Avi for bringing the idea and Gleb, PeterZ for > precious suggestions during the discussion. > Thanks Srikar for suggesting to avoid rcu lock while checking task state > that has improved overcommit cases. > > There are basically two approches for the implementation. > > Method 1: Uses per vcpu preempt flag (this series). > > Method 2: We keep a bitmap of preempted vcpus. using this we can easily > iterate over preempted vcpus. > > Note that method 2 needs an extra index variable to identify/map bitmap to > vcpu and it also needs static vcpu allocation. We definitely don't want something that requires static vcpu allocation. I think it'd be better to add another counter for the vcpu bit assignment. > > I am also posting Method 2 approach for reference in case it interests. I guess the interest in Method2 would come from perf numbers. Did you try comparing Method1 vs. Method2? > > Result: decent improvement for kernbench and ebizzy. > > base = 3.8.0 + undercommit patches > patched = base + preempt patches > > Tested on 32 core (no HT) mx3850 machine with 32 vcpu guest 8GB RAM > > --+-----------+-----------+-----------+------------+-----------+ > kernbench (exec time in sec lower is beter) > --+-----------+-----------+-----------+------------+-----------+ > base stdev patched stdev %improve > --+-----------+-----------+-----------+------------+-----------+ > 1x 47.0383 4.6977 44.2584 1.2899 5.90986 > 2x 96.0071 7.1873 91.2605 7.3567 4.94401 > 3x 164.0157 10.3613 156.6750 11.4267 4.47561 > 4x 212.5768 23.7326 204.4800 13.2908 3.80888 > --+-----------+-----------+-----------+------------+-----------+ > no ple kernbench 1x result for reference: 46.056133 > > --+-----------+-----------+-----------+------------+-----------+ > ebizzy (record/sec higher is better) > --+-----------+-----------+-----------+------------+-----------+ > base stdev patched stdev %improve > --+-----------+-----------+-----------+------------+-----------+ > 1x 5609.2000 56.9343 6263.7000 64.7097 11.66833 > 2x 2071.9000 108.4829 2653.5000 181.8395 28.07085 > 3x 1557.4167 109.7141 1993.5000 166.3176 28.00043 > 4x 1254.7500 91.2997 1765.5000 237.5410 40.70532 > --+-----------+-----------+-----------+------------+-----------+ > no ple ebizzy 1x result for reference : 7394.9 rec/sec > > Please let me know if you have any suggestions and comments. > > Raghavendra K T (2): > kvm: Record the preemption status of vcpus using preempt notifiers > kvm: Iterate over only vcpus that are preempted > > ---- > include/linux/kvm_host.h | 1 + > virt/kvm/kvm_main.c | 7 +++++++ > 2 files changed, 8 insertions(+) > > Reference patch for Method 2 > ---8<--- > Use preempt bitmap and optimize vcpu iteration using preempt notifiers > > From: Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> > > Record the preempted vcpus in a bit map using preempt notifiers. > Add the logic of iterating over only preempted vcpus thus making > vcpu iteration fast. > Thanks Jiannan, Avi for initially proposing patch. Gleb, Peter for > precious suggestions. > Thanks srikar for suggesting to remove rcu lock while checking > task state that helped in reducing overcommit overhead > > Not-yet-signed-off-by: Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> > --- > include/linux/kvm_host.h | 7 +++++++ > virt/kvm/kvm_main.c | 15 ++++++++++++--- > 2 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index cad77fe..8c4a2409 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -252,6 +252,7 @@ struct kvm_vcpu { > bool dy_eligible; > } spin_loop; > #endif > + int idx; > struct kvm_vcpu_arch arch; > }; > > @@ -385,6 +386,7 @@ struct kvm { > long mmu_notifier_count; > #endif > long tlbs_dirty; > + DECLARE_BITMAP(preempt_bitmap, KVM_MAX_VCPUS); > }; > > #define kvm_err(fmt, ...) \ > @@ -413,6 +415,11 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) > (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ > idx++) > > +#define kvm_for_each_preempted_vcpu(idx, vcpup, kvm, n) \ > + for (idx = find_first_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS); \ > + idx < n && (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \ > + idx = find_next_bit(kvm->preempt_bitmap, KVM_MAX_VCPUS, idx+1)) > + > #define kvm_for_each_memslot(memslot, slots) \ > for (memslot = &slots->memslots[0]; \ > memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\ > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index adc68fe..1db16b3 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1770,10 +1770,12 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > struct kvm_vcpu *vcpu; > int last_boosted_vcpu = me->kvm->last_boosted_vcpu; > int yielded = 0; > + int num_vcpus; > int try = 3; > int pass; > int i; > - > + > + num_vcpus = atomic_read(&kvm->online_vcpus); > kvm_vcpu_set_in_spin_loop(me, true); > /* > * We boost the priority of a VCPU that is runnable but not > @@ -1783,7 +1785,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) > * We approximate round-robin by starting at the last boosted VCPU. > */ > for (pass = 0; pass < 2 && !yielded && try; pass++) { > - kvm_for_each_vcpu(i, vcpu, kvm) { > + kvm_for_each_preempted_vcpu(i, vcpu, kvm, num_vcpus) { > if (!pass && i <= last_boosted_vcpu) { > i = last_boosted_vcpu; > continue; > @@ -1878,6 +1880,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu) > static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) > { > int r; > + int curr_idx; > struct kvm_vcpu *vcpu, *v; > > vcpu = kvm_arch_vcpu_create(kvm, id); > @@ -1916,7 +1919,9 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) > goto unlock_vcpu_destroy; > } > > - kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu; > + curr_idx = atomic_read(&kvm->online_vcpus); > + kvm->vcpus[curr_idx] = vcpu; > + vcpu->idx = curr_idx; > smp_wmb(); > atomic_inc(&kvm->online_vcpus); > > @@ -2902,6 +2907,7 @@ struct kvm_vcpu *preempt_notifier_to_vcpu(struct preempt_notifier *pn) > static void kvm_sched_in(struct preempt_notifier *pn, int cpu) > { > struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); > + clear_bit(vcpu->idx, vcpu->kvm->preempt_bitmap); > > kvm_arch_vcpu_load(vcpu, cpu); > } > @@ -2911,6 +2917,9 @@ static void kvm_sched_out(struct preempt_notifier *pn, > { > struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn); > > + if (current->state == TASK_RUNNING) > + set_bit(vcpu->idx, vcpu->kvm->preempt_bitmap); > + > kvm_arch_vcpu_put(vcpu); > } > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html