* Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote: > On 01/25/2013 04:17 PM, Ingo Molnar wrote: > > > >* Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote: > > > >>* Ingo Molnar <mingo@xxxxxxxxxx> [2013-01-24 11:32:13]: > >> > >>> > >>>* Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote: > >>> > >>>>From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > >>>> > >>>>In case of undercomitted scenarios, especially in large guests > >>>>yield_to overhead is significantly high. when run queue length of > >>>>source and target is one, take an opportunity to bail out and return > >>>>-ESRCH. This return condition can be further exploited to quickly come > >>>>out of PLE handler. > >>>> > >>>>(History: Raghavendra initially worked on break out of kvm ple handler upon > >>>> seeing source runqueue length = 1, but it had to export rq length). > >>>> Peter came up with the elegant idea of return -ESRCH in scheduler core. > >>>> > >>>>Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > >>>>Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) > >>>>Reviewed-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> > >>>>Signed-off-by: Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> > >>>>Acked-by: Andrew Jones <drjones@xxxxxxxxxx> > >>>>Tested-by: Chegu Vinod <chegu_vinod@xxxxxx> > >>>>--- > >>>> > >>>> kernel/sched/core.c | 25 +++++++++++++++++++------ > >>>> 1 file changed, 19 insertions(+), 6 deletions(-) > >>>> > >>>>diff --git a/kernel/sched/core.c b/kernel/sched/core.c > >>>>index 2d8927f..fc219a5 100644 > >>>>--- a/kernel/sched/core.c > >>>>+++ b/kernel/sched/core.c > >>>>@@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); > >>>> * It's the caller's job to ensure that the target task struct > >>>> * can't go away on us before we can do any checks. > >>>> * > >>>>- * Returns true if we indeed boosted the target task. > >>>>+ * Returns: > >>>>+ * true (>0) if we indeed boosted the target task. > >>>>+ * false (0) if we failed to boost the target. > >>>>+ * -ESRCH if there's no task to yield to. > >>>> */ > >>>> bool __sched yield_to(struct task_struct *p, bool preempt) > >>>> { > >>>>@@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) > >>>> > >>>> again: > >>>> p_rq = task_rq(p); > >>>>+ /* > >>>>+ * If we're the only runnable task on the rq and target rq also > >>>>+ * has only one task, there's absolutely no point in yielding. > >>>>+ */ > >>>>+ if (rq->nr_running == 1 && p_rq->nr_running == 1) { > >>>>+ yielded = -ESRCH; > >>>>+ goto out_irq; > >>>>+ } > >>> > >>>Looks good to me in principle. > >>> > >>>Would be nice to get more consistent benchmark numbers. Once > >>>those are unambiguously showing that this is a win: > >>> > >>> Acked-by: Ingo Molnar <mingo@xxxxxxxxxx> > >>> > >> > >>I ran the test with kernbench and sysbench again on 32 core mx3850 > >>machine with 32 vcpu guests. Results shows definite improvements. > >> > >>ebizzy and dbench show similar improvement for 1x overcommit > >>(note that stdev for 1x in dbench is lesser improvemet is now seen at > >>only 20%) > >> > >>[ all the experiments are taken out of 8 run averages ]. > >> > >>The patches benefit large guest undercommit scenarios, so I believe > >>with large guest performance improvemnt is even significant. [ Chegu > >>Vinod results show performance near to no ple cases ]. Unfortunately I > >>do not have a machine to test larger guest (>32). > >> > >>Ingo, Please let me know if this is okay to you. > >> > >>base kernel = 3.8.0-rc4 > >> > >>+-----------+-----------+-----------+------------+-----------+ > >> kernbench (time in sec lower is better) > >>+-----------+-----------+-----------+------------+-----------+ > >> base stdev patched stdev %improve > >>+-----------+-----------+-----------+------------+-----------+ > >>1x 46.6028 1.8672 42.4494 1.1390 8.91234 > >>2x 99.9074 9.1859 90.4050 2.6131 9.51121 > >>+-----------+-----------+-----------+------------+-----------+ > >>+-----------+-----------+-----------+------------+-----------+ > >> sysbench (time in sec lower is better) > >>+-----------+-----------+-----------+------------+-----------+ > >> base stdev patched stdev %improve > >>+-----------+-----------+-----------+------------+-----------+ > >>1x 18.7402 0.3764 17.7431 0.3589 5.32065 > >>2x 13.2238 0.1935 13.0096 0.3152 1.61981 > >>+-----------+-----------+-----------+------------+-----------+ > >> > >>+-----------+-----------+-----------+------------+-----------+ > >> ebizzy (records/sec higher is better) > >>+-----------+-----------+-----------+------------+-----------+ > >> base stdev patched stdev %improve > >>+-----------+-----------+-----------+------------+-----------+ > >>1x 2421.9000 19.1801 5883.1000 112.7243 142.91259 > >>+-----------+-----------+-----------+------------+-----------+ > >> > >>+-----------+-----------+-----------+------------+-----------+ > >> dbench (throughput MB/sec higher is better) > >>+-----------+-----------+-----------+------------+-----------+ > >> base stdev patched stdev %improve > >>+-----------+-----------+-----------+------------+-----------+ > >>1x 11675.9900 857.4154 14103.5000 215.8425 20.79061 > >>+-----------+-----------+-----------+------------+-----------+ > > > >The numbers look pretty convincing, thanks. The workloads were > >CPU bound most of the time, right? > > Yes. CPU bound most of the time. I also used tmpfs to reduce > io overhead (for dbbench). Ok, cool. Which tree will this be upstreamed through - the KVM tree? I'd suggest the KVM tree because KVM will be the one exposed to the effects of this change. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html