On 08/25/2010 05:43 PM, Glauber Costa wrote:
This patch proposes a common steal time implementation. When no steal time is accounted, we just add a branch to the current accounting code, that shouldn't add much overhead. When we do want to register steal time, we proceed as following: - if we would account user or system time in this tick, and there is out-of-cpu time registered, we skip it altogether, and account steal time only. - if we would account user or system time in this tick, and we got the cpu for the whole slice, we proceed normaly. - if we are idle in this tick, we flush out-of-cpu time to give it the chance to update whatever last-measure internal variable it may have. This approach is simple, but proved to work well for my test scenarios. in a UP guest on UP host, with a cpu-hog in both guest and host shows ~ 50 % steal time. steal time is also accounted proportionally, if nice values are given to the host cpu-hog. A cpu-hog in the host with no load in the guest, produces 0 % steal time, with 100 % idle, as one would expect. Signed-off-by: Glauber Costa<glommer@xxxxxxxxxx> --- include/linux/sched.h | 1 + kernel/sched.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 0 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 0478888..e571ddd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -312,6 +312,7 @@ long io_schedule_timeout(long timeout); extern void cpu_init (void); extern void trap_init(void); extern void update_process_times(int user); +extern cputime_t (*hypervisor_steal_time)(void); extern void scheduler_tick(void); extern void sched_show_task(struct task_struct *p); diff --git a/kernel/sched.c b/kernel/sched.c index f52a880..9695c92 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3157,6 +3157,16 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p) return ns; } +cputime_t (*hypervisor_steal_time)(void) = NULL; + +static inline cputime_t get_steal_time_from_hypervisor(void) +{ + if (!hypervisor_steal_time) + return 0; + return hypervisor_steal_time(); +} + + /* * Account user cpu time to a process. * @p: the process that the cpu time gets accounted to @@ -3169,6 +3179,12 @@ void account_user_time(struct task_struct *p, cputime_t cputime, struct cpu_usage_stat *cpustat =&kstat_this_cpu.cpustat; cputime64_t tmp; + tmp = get_steal_time_from_hypervisor(); + if (tmp) { + account_steal_time(tmp); + return; + } + /* Add user time to process. */ p->utime = cputime_add(p->utime, cputime); p->utimescaled = cputime_add(p->utimescaled, cputime_scaled);
I see one problem here. What if get_steal_time_from_hypervisor() returns a smaller amount of time than "cputime"? Would it be better to account tmp as stealtime, and the difference (cputime - tmp) as user/sys/... time? -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html