Patch "sched/cputime: Fix steal time accounting vs. CPU hotplug" has been added to the 4.5-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    sched/cputime: Fix steal time accounting vs. CPU hotplug

to the 4.5-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-cputime-fix-steal-time-accounting-vs.-cpu-hotplug.patch
and it can be found in the queue-4.5 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.


>From e9532e69b8d1d1284e8ecf8d2586de34aec61244 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Fri, 4 Mar 2016 15:59:42 +0100
Subject: sched/cputime: Fix steal time accounting vs. CPU hotplug

From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream.

On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Acked-by: Rik van Riel <riel@xxxxxxxxxx>
Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Cc: Glauber Costa <glommer@xxxxxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

---
 kernel/sched/core.c  |    1 +
 kernel/sched/sched.h |   13 +++++++++++++
 2 files changed, 14 insertions(+)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5630,6 +5630,7 @@ migration_call(struct notifier_block *nf
 
 	case CPU_UP_PREPARE:
 		rq->calc_load_update = calc_load_update;
+		account_reset_rq(rq);
 		break;
 
 	case CPU_ONLINE:
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+	rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+	rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+	rq->prev_steal_time_rq = 0;
+#endif
+}


Patches currently in stable-queue which might be from tglx@xxxxxxxxxxxxx are

queue-4.5/edac-sb_edac-fix-computation-of-channel-address.patch
queue-4.5/sched-fair-avoid-using-decay_load_missed-with-a-negative-value.patch
queue-4.5/x86-iopl-64-properly-context-switch-iopl-on-xen-pv.patch
queue-4.5/sched-cputime-fix-steal_account_process_tick-to-always-return-jiffies.patch
queue-4.5/bitops-do-not-default-to-__clear_bit-for-__clear_bit_unlock.patch
queue-4.5/sched-cputime-fix-steal-time-accounting-vs.-cpu-hotplug.patch
queue-4.5/x86-iopl-fix-iopl-capability-check-on-xen-pv.patch
queue-4.5/x86-apic-fix-suspicious-rcu-usage-in-smp_trace_call_function_interrupt.patch
queue-4.5/perf-x86-intel-add-definition-for-pt-pmi-bit.patch
queue-4.5/x86-microcode-intel-make-early-loader-look-for-builtin-microcode-too.patch
queue-4.5/perf-core-fix-perf_sched_count-derailment.patch
queue-4.5/x86-microcode-untangle-from-blk_dev_initrd.patch
queue-4.5/x86-irq-cure-live-lock-in-fixup_irqs.patch
queue-4.5/x86-entry-compat-keep-ts_compat-set-during-signal-delivery.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]