2016-06-07 19:41 GMT+08:00 Wanpeng Li <wanpeng.li@xxxxxxxxxxx>: > On 07/06/2016 10:00, Wanpeng Li wrote: > >> From: Wanpeng Li <wanpeng.li@xxxxxxxxxxx> >> >> Commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU >> hotplug") >> set rq->prev_* to 0 after a cpu hotplug comes back in order to fix the >> scenario: >> >> | steal is smaller than rq->prev_steal_time we end up with an insane large >> | value which then gets added to rq->prev_steal_time, resulting in a >> permanent >> | wreckage of the accounting. >> >> However, it is still buggy. >> >> rq->prev_steal_time = 0: >> >> As Rik pointed out: >> >> | setting rq->prev_irq_time to 0 in the guest, and then getting a giant >> value from >> | the host, could result in a very large of steal_jiffies. >> >> rq->prev_steal_time_rq = 0: >> >> | steal = paravirt_steal_clock(cpu_of(rq)); >> | steal -= rq->prev_steal_time_rq; >> | >> | if (unlikely(steal > delta)) >> | steal = delta; >> | >> | rq->prev_steal_time_rq += steal; >> | delta -= steal; >> | >> | rq->clock_task += delta; >> >> steal is a giant value and rq->prev_steal_time_rq is 0, >> rq->prev_steal_time_rq >> grows in delta granularity, rq->clock_task can't ramp up until >> rq->prev_steal_time_rq >> catches up steal clock since delta value will be 0 after reducing steal >> time from >> normal execution time. That's why I obersved that cpuhg/1-12 continue >> running >> until rq->prev_steal_time_rq catches up steal clock timestamp. >> >> I believe rq->prev_irq_time has similar issue. So this patch fix it by >> setting >> rq->prev_* to current irq time and steal clock timestamp after a cpu >> hotplug >> comes back. > > I'm not sure this patch is necessary. Instead you could just revert > commit e9532e69b8d1. The previous patch obviously makes it unnecessary > to reset rq->prev_steal_time and rq->prev_steal_time_rq, and the reset > of rq->prev_irq_time looks like a no-op to me. The reason why I'm not just simple revert it is that commit mentioned "steal is smaller than rq->prev_steal_time we end up with an insane large value which then gets added to rq->prev_steal_time, resulting in a permanent wreckage of the accounting." Though I didn't meet such scenario. So I just do what that commit really want to do. Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html