On Thu, Aug 26, 2010 at 05:19:23PM -0400, Rik van Riel wrote: > On 08/25/2010 05:43 PM, Glauber Costa wrote: > >This patch proposes a common steal time implementation. When no > >steal time is accounted, we just add a branch to the current > >accounting code, that shouldn't add much overhead. > > > >When we do want to register steal time, we proceed as following: > >- if we would account user or system time in this tick, and there is > > out-of-cpu time registered, we skip it altogether, and account steal > > time only. > >- if we would account user or system time in this tick, and we got the > > cpu for the whole slice, we proceed normaly. > >- if we are idle in this tick, we flush out-of-cpu time to give it the > > chance to update whatever last-measure internal variable it may have. > > > >This approach is simple, but proved to work well for my test scenarios. > >in a UP guest on UP host, with a cpu-hog in both guest and host shows > >~ 50 % steal time. steal time is also accounted proportionally, if > >nice values are given to the host cpu-hog. > > > >A cpu-hog in the host with no load in the guest, produces 0 % steal time, > >with 100 % idle, as one would expect. > > > >Signed-off-by: Glauber Costa<glommer@xxxxxxxxxx> > >--- > > include/linux/sched.h | 1 + > > kernel/sched.c | 29 +++++++++++++++++++++++++++++ > > 2 files changed, 30 insertions(+), 0 deletions(-) > > > >diff --git a/include/linux/sched.h b/include/linux/sched.h > >index 0478888..e571ddd 100644 > >--- a/include/linux/sched.h > >+++ b/include/linux/sched.h > >@@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout); > > extern void cpu_init (void); > > extern void trap_init(void); > > extern void update_process_times(int user); > >+extern cputime_t (*hypervisor_steal_time)(void); > > extern void scheduler_tick(void); > > > > extern void sched_show_task(struct task_struct *p); > >diff --git a/kernel/sched.c b/kernel/sched.c > >index f52a880..9695c92 100644 > >--- a/kernel/sched.c > >+++ b/kernel/sched.c > >@@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p) > > return ns; > > } > > > >+cputime_t (*hypervisor_steal_time)(void) = NULL; > >+ > >+static inline cputime_t get_steal_time_from_hypervisor(void) > >+{ > >+ if (!hypervisor_steal_time) > >+ return 0; > >+ return hypervisor_steal_time(); > >+} > >+ > >+ > > /* > > * Account user cpu time to a process. > > * @p: the process that the cpu time gets accounted to > >@@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime, > > struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat; > > cputime64_t tmp; > > > >+ tmp = get_steal_time_from_hypervisor(); > >+ if (tmp) { > >+ account_steal_time(tmp); > >+ return; > >+ } > >+ > > /* Add user time to process. */ > > p->utime = cputime_add(p->utime, cputime); > > p->utimescaled = cputime_add(p->utimescaled, cputime_scaled); > > I see one problem here. > > What if get_steal_time_from_hypervisor() returns a smaller > amount of time than "cputime"? > > Would it be better to account tmp as stealtime, and the > difference (cputime - tmp) as user/sys/... time? There is also the case in which tmp is greater than cputime, but not a multiple of it. In which case, I believe we should account cputime - (tmp % cputime) as user/sys too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html