Re: [PATCH 13/40] autonuma: CPU follow memory algorithm

Nai Xia <nai.xia@xxxxxxxxx> · Mon, 02 Jul 2012 16:31:40 +0800

On 2012年07月02日 16:17, Rik van Riel wrote:
On 07/02/2012 03:56 AM, Nai Xia wrote:

On 2012年07月02日 15:36, Rik van Riel wrote:
On 06/30/2012 11:10 AM, Nai Xia wrote:

Yes, pte_numa or pte_young works the same way and they both can
answer the problem of "which pages were accessed since last scan".
For LRU, it's OK, it's quite enough. But for numa balancing it's NOT.

Getting LRU right may be much more important than getting
NUMA balancing right.

Retrieving wrongly evicted data from disk can be a million
of times slower than fetching data from RAM, while the
penalty for accessing a remote NUMA node is only 20% or so.

We also should care about the hotness of the page sets, since if the
workloads are complex we should NOT be expecting that "if this page
is accessed once, then it's always in my CPU cache during the whole
last scan interval".

The difference between LRU and the problem you are trying to deal
with looks so obvious to me, I am so worried that you are still
messing them up :(

For autonuma, it may be fine to have a lower likelyhood of
obtaining an optimum result, because the penalty for getting
it wrong is so much lower.

I said, I am actually want to see some detailed analysis
showing that this sampling is really playing an important role
in benchmarks as it claims to be. Not a quick
"lower likelyhood than optimum" conclusion.....

Please, Rik, I know your points, you don't have to explain
anymore. But I just cannot follow without research data.

What kind of data are you looking for?

I have seen a lot of generic comments in your emails,
and one gut feeling about Andrea's sampling algorithm,
but I seem to have missed the details of exactly what
you are looking for.

Btw, I share your feeling that Andrea's sampling
algorithm will probably not be able to distinguish
between NUMA nodes that are very frequent users of
a page, and NUMA nodes that use the same page much
less frequently.

However, I suspect that the penalty of getting it
wrong will be fairly low, while the overhead of
getting access frequency information will be
prohibitively high. There is a reason nobody uses
LRU nowadays, but a clock style algorithm instead.

I think I won't repeat myself again and again and
again and get lost in tons of words.

Thank you for your comments, Rik, and best wishes.
This is my last reply.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>