On Mon, 2011-01-24 at 13:06 -0500, Glauber Costa wrote: > This is a first proposal for using steal time information > to influence the scheduler. There are a lot of optimizations > and fine grained adjustments to be done, but it is working reasonably > so far for me (mostly) > > With this patch (and some host pinnings to demonstrate the situation), > two vcpus with very different steal time (Say 80 % vs 1 %) will not get > an even distribution of processes. This is a situation that can naturally > arise, specially in overcommited scenarios. Previosly, the guest scheduler > would wrongly think that all cpus have the same ability to run processes, > lowering the overall throughput. > > Signed-off-by: Glauber Costa <glommer@xxxxxxxxxx> > CC: Rik van Riel <riel@xxxxxxxxxx> > CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@xxxxxxxxxx> > CC: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > CC: Avi Kivity <avi@xxxxxxxxxx> > --- > include/linux/sched.h | 1 + > kernel/sched.c | 4 ++++ > kernel/sched_fair.c | 10 ++++++++++ > 3 files changed, 15 insertions(+), 0 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index d747f94..beab72d 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -302,6 +302,7 @@ long io_schedule_timeout(long timeout); > extern void cpu_init (void); > extern void trap_init(void); > extern void update_process_times(int user); > +extern cputime_t (*hypervisor_steal_time)(void); > extern void scheduler_tick(void); > > extern void sched_show_task(struct task_struct *p); > diff --git a/kernel/sched.c b/kernel/sched.c > index 3b3e88d..c460f0d 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -501,6 +501,8 @@ struct rq { > struct sched_domain *sd; > > unsigned long cpu_power; > + unsigned long real_ticks; > + unsigned long total_ticks; > > unsigned char idle_at_tick; > /* For active balancing */ > @@ -3533,10 +3535,12 @@ static int touch_steal_time(int is_idle) > if (is_idle) > return 0; > > + rq->total_ticks++; > if (steal) { > account_steal_time(steal); > return 1; > } > + rq->real_ticks++; > return 0; > } yuck!! is ticks really the best you can do? I thought kvm had a ns resolution steal-time clock? > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index c62ebae..1364c28 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -2509,6 +2509,16 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) > power >>= SCHED_LOAD_SHIFT; > } > > + WARN_ON(cpu_rq(cpu)->real_ticks > cpu_rq(cpu)->total_ticks); > + > + if (cpu_rq(cpu)->total_ticks) { > + power *= cpu_rq(cpu)->real_ticks; > + power /= cpu_rq(cpu)->total_ticks; > + } > + > + cpu_rq(cpu)->total_ticks = 0; > + cpu_rq(cpu)->real_ticks = 0; > + > sdg->cpu_power_orig = power; > > if (sched_feat(ARCH_POWER)) I would really much rather see you change update_rq_clock_task() and subtract your ns resolution steal time from our wall-time, update_rq_clock_task() already updates the cpu_power relative to the remaining time available. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html