[tip:numa/core] sched: Introduce staged average NUMA faults

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Commit-ID:  a093b20dff51759e51ef907d7409bde2ce0b0c50
Gitweb:     http://git.kernel.org/tip/a093b20dff51759e51ef907d7409bde2ce0b0c50
Author:     Ingo Molnar <mingo@xxxxxxxxxx>
AuthorDate: Thu, 15 Nov 2012 09:46:36 +0100
Committer:  Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Mon, 3 Dec 2012 10:19:31 +0100

sched: Introduce staged average NUMA faults

The current way of building the p->numa_faults[2][node] faults
statistics has a sampling artifact:

The continuous and immediate nature of propagating new fault
stats to the numa_faults array creates a 'pulsating' dynamic,
that starts at the average value at the beginning of the scan,
increases monotonically until we finish the scan to about twice
the average, and then drops back to half of its value due to
the running average.

Since we rely on these values to balance tasks, the pulsating
nature resulted in false migrations and general noise in the
stats.

To solve this, introduce buffering of the current scan via
p->task_numa_faults_curr[]. The array is co-allocated with the
p->task_numa[] for efficiency reasons, but it is otherwise an
ordinary separate array.

At the end of the scan we propagate the latest stats into the
average stats value. Most of the balancing code stays unmodified.

The cost of this change is that we delay the effects of the latest
round of faults by 1 scan - but using the partial faults info was
creating artifacts.

This instantly stabilized the page fault stats and improved
numa02-alike workloads by making them faster to converge.

Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
 kernel/sched/fair.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9c46b45..1ab11be 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -852,12 +852,26 @@ static void task_numa_placement(struct task_struct *p)
 
 	p->numa_scan_seq = seq;
 
+	/*
+	 * Update the fault average with the result of the latest
+	 * scan:
+	 */
 	for (node = 0; node < nr_node_ids; node++) {
 		faults = 0;
 		for (priv = 0; priv < 2; priv++) {
-			faults += p->numa_faults[2*node + priv];
-			total[priv] += p->numa_faults[2*node + priv];
-			p->numa_faults[2*node + priv] /= 2;
+			unsigned int new_faults;
+			unsigned int idx;
+
+			idx = 2*node + priv;
+			new_faults = p->numa_faults_curr[idx];
+			p->numa_faults_curr[idx] = 0;
+
+			/* Keep a simple running average: */
+			p->numa_faults[idx] += new_faults;
+			p->numa_faults[idx] /= 2;
+
+			faults += p->numa_faults[idx];
+			total[priv] += p->numa_faults[idx];
 		}
 		if (faults > max_faults) {
 			max_faults = faults;
--
To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Stable Commits]     [Linux Stable Kernel]     [Linux Kernel]     [Linux USB Devel]     [Linux Video &Media]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux