On Mon, 2010-08-30 at 19:03 -0400, Rik van Riel wrote: > > > I think it basically comes down to adding "sched_clock_unstolen()" which > > the scheduler can use to measure time a process spends running, and > > sched_clock() for measuring sleep times. In the normal case, > > sched_clock_unstolen() would be the same as sched_clock(). > > That requires the host to export (any time the guest is scheduled > in), the amount of CPU time the VCPU thread has used, and the time > the VCPU was scheduled in. > > Since the VCPU must be running when it is examining these variables, > it can calculate the additional time (since it was last scheduled) > to account to the task, and remember the currently calculated time > in its own per-vcpu variable, so next time it can get a delta again. I think its easier (and sufficient) for the host to tell the guest how long it was _not_ running. That can simply be passed in when you start the vcpu again and doesn't need a fancy communication channel. The guests sched_clock() will measure wall time, the guests sched_clock_stolen() will report the accumulation of these stolen times. Then you can make sched_clock_unstolen() be sched_clock() - sched_clock_stolen(). And like Jeremy said, if you make the sched_fair stuff use sched_clock_unstolen() things should more or less work. The problem with all that is that you'll start to schedule on unstolen time instead of wall-time, which might not give the best results for things like latencies etc.. but I guess that's one of the prices you pay for using virt. Also, like said yesterday, you need some factor in update_cpu_power(), a quick hack might be to add all stolen time to sched_rt_avg_update(). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html