Re: [PATCH 13/40] autonuma: CPU follow memory algorithm

Rik van Riel <riel@xxxxxxxxxx> · Mon, 02 Jul 2012 03:36:02 -0400

On 06/30/2012 11:10 AM, Nai Xia wrote:

Yes, pte_numa or pte_young works the same way and they both can
answer the problem of "which pages were accessed since last scan".
For LRU, it's OK, it's quite enough. But for numa balancing it's NOT.

Getting LRU right may be much more important than getting
NUMA balancing right.

Retrieving wrongly evicted data from disk can be a million
of times slower than fetching data from RAM, while the
penalty for accessing a remote NUMA node is only 20% or so.

We also should care about the hotness of the page sets, since if the
workloads are complex we should NOT be expecting that "if this page
is accessed once, then it's always in my CPU cache during the whole
last scan interval".

The difference between LRU and the problem you are trying to deal
with looks so obvious to me, I am so worried that you are still
messing them up :(

For autonuma, it may be fine to have a lower likelyhood of
obtaining an optimum result, because the penalty for getting
it wrong is so much lower.

Say that LRU evicted the wrong page once every 10,000
evictions. At a disk IO penalty of a million times slower
than accessing RAM, that would result in a 100x slowdown.

Now say that autonuma places a page in the wrong NUMA
node once every 10 times. With a 20% penalty for accessing
memory on a remote NUMA node, that results in a 2% slowdown.

Even if the NUMA penalty was 100% (2x as slow remote access
vs. local), it would only be a 10% slowdown.

Why do you think CPU caches can get away with such small
associativity sets and simple eviction algorithms? :)

--
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>