On Thu, Aug 31, 2017 at 03:37:09PM +0200, Greg KH wrote: > What patch? Attaching the patch from this link: https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=871608;filename=handle-decreasing-steal-clock.patch;msg=5 > What is the git commit id of the aptch in Linus's tree that resolves > this issue? The issue was fixed later in 4.11, but this might be a bigger change so not sure if you want to take that: 2b1f967d80e8e5d7361f0e1654c842869570f573 sched/cputime: Complete nsec conversion of tick based accounting -- Valentin
>From 4b66621a06a94d22629661a9262f92b8cf5b7ca9 Mon Sep 17 00:00:00 2001 From: Michael Lass <bevan@xxxxxxxxx> Date: Sun, 6 Aug 2017 18:09:21 +0200 Subject: [PATCH] sched/cputime: handle decreasing steal clock On some flaky Xen hosts, the steal clock returned by paravirt_steal_clock is not monotonically increasing but can slightly decrease. Currently this results in an overflow of u64 steal. Before giving this number to account_steal_time() it is converted into cputime, so the target cpustat counter cpustat[CPUTIME_STEAL] is not overflowing as well but instead increased by a large amount. Due to the conversion to cputime and back into nanoseconds, this_rq()->prev_steal_time does not correctly reflect the latest reported steal clock afterwards, resulting in erratic behavior such as backwards running cpustat[CPUTIME_STEAL]. The following is a trace from userspace of the value for steal time reported in /proc/stat: time stolen diff ---- ------ ---- 0ms 784 100ms 1844670130367 1844670129583 200ms 1844664564089 -5566278 300ms 1844659554439 -5009650 400ms 1844655101417 -4453022 This issue was probably introduced by the following commits, which deactivate a check for (steal < 0) in the Xen pv guest codepath and allow unlimited jumps of the cpustat counters (both introduced in v4.8): ecb23dc6f2eff0ce64dd60351a81f376f13b12cc 03cbc732639ddcad15218c4b2046d255851ff1e3 As a workaround, ignore decreasing values steal clock. By not updating this_rq()->prev_steal_time we make sure that steal time is only accuonted as soon as the steal clock raises above the value that was already observed and accounted for earlier. In current kernel versions (v4.11 and higher) this issue should not exist since conversion between nsec and cputime has been eliminated. Therefore all values will overflow, i.e. decrease as reported by the host system. --- kernel/sched/cputime.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 5ebee3164e64..5f039f7f9294 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -262,10 +262,19 @@ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime) #ifdef CONFIG_PARAVIRT if (static_key_false(¶virt_steal_enabled)) { cputime_t steal_cputime; - u64 steal; - - steal = paravirt_steal_clock(smp_processor_id()); - steal -= this_rq()->prev_steal_time; + u64 steal_time; + s64 steal; + + steal_time = paravirt_steal_clock(smp_processor_id()); + steal = steal_time - this_rq()->prev_steal_time; + + if (unlikely(steal < 0)) { + printk_ratelimited(KERN_DEBUG "cputime: steal_clock for " + "processor %d decreased: %llu -> %llu, " + "ignoring\n", smp_processor_id(), + this_rq()->prev_steal_time, steal_time); + return 0; + } steal_cputime = min(nsecs_to_cputime(steal), maxtime); account_steal_time(steal_cputime); -- 2.14.0