* Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote: > * Ingo Molnar <mingo@xxxxxxxxxx> [2013-01-24 11:32:13]: > > > > > * Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote: > > > > > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > > > > In case of undercomitted scenarios, especially in large guests > > > yield_to overhead is significantly high. when run queue length of > > > source and target is one, take an opportunity to bail out and return > > > -ESRCH. This return condition can be further exploited to quickly come > > > out of PLE handler. > > > > > > (History: Raghavendra initially worked on break out of kvm ple handler upon > > > seeing source runqueue length = 1, but it had to export rq length). > > > Peter came up with the elegant idea of return -ESRCH in scheduler core. > > > > > > Signed-off-by: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) > > > Reviewed-by: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> > > > Acked-by: Andrew Jones <drjones@xxxxxxxxxx> > > > Tested-by: Chegu Vinod <chegu_vinod@xxxxxx> > > > --- > > > > > > kernel/sched/core.c | 25 +++++++++++++++++++------ > > > 1 file changed, 19 insertions(+), 6 deletions(-) > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 2d8927f..fc219a5 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); > > > * It's the caller's job to ensure that the target task struct > > > * can't go away on us before we can do any checks. > > > * > > > - * Returns true if we indeed boosted the target task. > > > + * Returns: > > > + * true (>0) if we indeed boosted the target task. > > > + * false (0) if we failed to boost the target. > > > + * -ESRCH if there's no task to yield to. > > > */ > > > bool __sched yield_to(struct task_struct *p, bool preempt) > > > { > > > @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) > > > > > > again: > > > p_rq = task_rq(p); > > > + /* > > > + * If we're the only runnable task on the rq and target rq also > > > + * has only one task, there's absolutely no point in yielding. > > > + */ > > > + if (rq->nr_running == 1 && p_rq->nr_running == 1) { > > > + yielded = -ESRCH; > > > + goto out_irq; > > > + } > > > > Looks good to me in principle. > > > > Would be nice to get more consistent benchmark numbers. Once > > those are unambiguously showing that this is a win: > > > > Acked-by: Ingo Molnar <mingo@xxxxxxxxxx> > > > > I ran the test with kernbench and sysbench again on 32 core mx3850 > machine with 32 vcpu guests. Results shows definite improvements. > > ebizzy and dbench show similar improvement for 1x overcommit > (note that stdev for 1x in dbench is lesser improvemet is now seen at > only 20%) > > [ all the experiments are taken out of 8 run averages ]. > > The patches benefit large guest undercommit scenarios, so I believe > with large guest performance improvemnt is even significant. [ Chegu > Vinod results show performance near to no ple cases ]. Unfortunately I > do not have a machine to test larger guest (>32). > > Ingo, Please let me know if this is okay to you. > > base kernel = 3.8.0-rc4 > > +-----------+-----------+-----------+------------+-----------+ > kernbench (time in sec lower is better) > +-----------+-----------+-----------+------------+-----------+ > base stdev patched stdev %improve > +-----------+-----------+-----------+------------+-----------+ > 1x 46.6028 1.8672 42.4494 1.1390 8.91234 > 2x 99.9074 9.1859 90.4050 2.6131 9.51121 > +-----------+-----------+-----------+------------+-----------+ > +-----------+-----------+-----------+------------+-----------+ > sysbench (time in sec lower is better) > +-----------+-----------+-----------+------------+-----------+ > base stdev patched stdev %improve > +-----------+-----------+-----------+------------+-----------+ > 1x 18.7402 0.3764 17.7431 0.3589 5.32065 > 2x 13.2238 0.1935 13.0096 0.3152 1.61981 > +-----------+-----------+-----------+------------+-----------+ > > +-----------+-----------+-----------+------------+-----------+ > ebizzy (records/sec higher is better) > +-----------+-----------+-----------+------------+-----------+ > base stdev patched stdev %improve > +-----------+-----------+-----------+------------+-----------+ > 1x 2421.9000 19.1801 5883.1000 112.7243 142.91259 > +-----------+-----------+-----------+------------+-----------+ > > +-----------+-----------+-----------+------------+-----------+ > dbench (throughput MB/sec higher is better) > +-----------+-----------+-----------+------------+-----------+ > base stdev patched stdev %improve > +-----------+-----------+-----------+------------+-----------+ > 1x 11675.9900 857.4154 14103.5000 215.8425 20.79061 > +-----------+-----------+-----------+------------+-----------+ The numbers look pretty convincing, thanks. The workloads were CPU bound most of the time, right? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html