Re: Xen: decreasing cpu steal clock counter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 31, 2017 at 03:37:09PM +0200, Greg KH wrote:
> What patch?

Attaching the patch from this link:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=871608;filename=handle-decreasing-steal-clock.patch;msg=5

> What is the git commit id of the aptch in Linus's tree that resolves
> this issue?

The issue was fixed later in 4.11, but this might be a bigger change
so not sure if you want to take that:

  2b1f967d80e8e5d7361f0e1654c842869570f573
  sched/cputime: Complete nsec conversion of tick based accounting

-- 
Valentin
>From 4b66621a06a94d22629661a9262f92b8cf5b7ca9 Mon Sep 17 00:00:00 2001
From: Michael Lass <bevan@xxxxxxxxx>
Date: Sun, 6 Aug 2017 18:09:21 +0200
Subject: [PATCH] sched/cputime: handle decreasing steal clock

On some flaky Xen hosts, the steal clock returned by paravirt_steal_clock is
not monotonically increasing but can slightly decrease. Currently this results
in an overflow of u64 steal. Before giving this number to account_steal_time()
it is converted into cputime, so the target cpustat counter
cpustat[CPUTIME_STEAL] is not overflowing as well but instead increased by a
large amount. Due to the conversion to cputime and back into nanoseconds,
this_rq()->prev_steal_time does not correctly reflect the latest reported steal
clock afterwards, resulting in erratic behavior such as backwards running
cpustat[CPUTIME_STEAL]. The following is a trace from userspace of the value for
steal time reported in /proc/stat:

time    stolen         diff
----    ------         ----
0ms     784
100ms   1844670130367  1844670129583
200ms   1844664564089  -5566278
300ms   1844659554439  -5009650
400ms   1844655101417  -4453022

This issue was probably introduced by the following commits, which deactivate a
check for (steal < 0) in the Xen pv guest codepath and allow unlimited jumps of
the cpustat counters (both introduced in v4.8):
ecb23dc6f2eff0ce64dd60351a81f376f13b12cc
03cbc732639ddcad15218c4b2046d255851ff1e3

As a workaround, ignore decreasing values steal clock. By not updating
this_rq()->prev_steal_time we make sure that steal time is only accuonted as
soon as the steal clock raises above the value that was already observed and
accounted for earlier.

In current kernel versions (v4.11 and higher) this issue should not exist since
conversion between nsec and cputime has been eliminated. Therefore all values
will overflow, i.e. decrease as reported by the host system.
---
 kernel/sched/cputime.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5ebee3164e64..5f039f7f9294 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -262,10 +262,19 @@ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
 #ifdef CONFIG_PARAVIRT
 	if (static_key_false(&paravirt_steal_enabled)) {
 		cputime_t steal_cputime;
-		u64 steal;
-
-		steal = paravirt_steal_clock(smp_processor_id());
-		steal -= this_rq()->prev_steal_time;
+		u64 steal_time;
+		s64 steal;
+
+		steal_time = paravirt_steal_clock(smp_processor_id());
+		steal = steal_time - this_rq()->prev_steal_time;
+
+		if (unlikely(steal < 0)) {
+			printk_ratelimited(KERN_DEBUG "cputime: steal_clock for "
+				"processor %d decreased: %llu -> %llu, "
+				"ignoring\n", smp_processor_id(),
+				this_rq()->prev_steal_time, steal_time);
+			return 0;
+		}
 
 		steal_cputime = min(nsecs_to_cputime(steal), maxtime);
 		account_steal_time(steal_cputime);
-- 
2.14.0

[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]