Re: [PATCH 13/40] autonuma: CPU follow memory algorithm

Nai Xia <nai.xia@xxxxxxxxx> · Fri, 06 Jul 2012 09:00:29 +0800

On 2012年07月06日 02:07, Rik van Riel wrote:
On 06/29/2012 04:01 PM, Nai Xia wrote:

Hey guys, Can I say NAK to these patches ?

Not necessarily the patches, but thinking about your
points some more, I thought of a much more serious
potential problem with Andrea's code.

Now I aware that this sampling algorithm is completely broken, if we take
a few seconds to see what it is trying to solve:

Andrea's patch can only approximate the pages_accessed number in a
time unit(scan interval),
I don't think it can catch even 1% of average_page_access_frequence
on a busy workload.

It is much more "interesting" than that.

Once the first thread gets a NUMA pagefault on a
particular page, the page is made present in the
page tables and NO OTHER THREAD will get NUMA
page faults.

That means when trying to compare the weighting
of NUMA accesses between different threads in a
10 second interval, we only know THE FIRST FAULT.

We have no information on whether any other threads
tried to access the same page, because we do not
get faults more frequently.

Not only do we not get use frequency information,
we may not get the information on which threads use
which memory, at all.

Somehow Andrea's code still seems to work.

On this point alone, I agree with Andrea's reasoning:
1. This information get averaged in noise.
2. If a thread statistically get more faults than others
then it may deserve to be biased.

Note, I mean only reasoning, I don't have enough
confidence if Andrea's coding is really working like
this, since I didn't do micro benchmarks on this part
of code.

It would be very interesting to know why.

Note my personal experience tells me that
sometimes you wrote a complex system, it works
like a charm. And later you cut out 30% of its
code, it's still working like a charm.

Sometimes a part of a system just is not that
relevant to the output of the whole benchmark,
and this fact may make it seemingly have good
resistance to false negatives/positives.
It's time to look inside with benchmarks, IMO.

Again, I have no intension or benefit in
disabling this algorithm. I am only curious
about the truth. Hope nobody will get offended.

Thanks,

Nai

How much sense does the following code still make,
considering we may never get all the info on which
threads use which memory?

+ /*
+ * Generate the w_nid/w_cpu_nid from the
+ * pre-computed mm/task_numa_weight[] and
+ * compute w_other using the w_m/w_t info
+ * collected from the other process.
+ */
+ if (mm == p->mm) {
+ if (w_t > w_t_t)
+ w_t_t = w_t;
+ w_other = w_t*AUTONUMA_BALANCE_SCALE/w_t_t;
+ w_nid = task_numa_weight[nid];
+ w_cpu_nid = task_numa_weight[cpu_nid];
+ w_type = W_TYPE_THREAD;

Andrea, what is the real reason your code works?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>