On 06/30/2012 11:10 AM, Nai Xia wrote:
Yes, pte_numa or pte_young works the same way and they both can answer the problem of "which pages were accessed since last scan". For LRU, it's OK, it's quite enough. But for numa balancing it's NOT.
Getting LRU right may be much more important than getting NUMA balancing right. Retrieving wrongly evicted data from disk can be a million of times slower than fetching data from RAM, while the penalty for accessing a remote NUMA node is only 20% or so.
We also should care about the hotness of the page sets, since if the workloads are complex we should NOT be expecting that "if this page is accessed once, then it's always in my CPU cache during the whole last scan interval". The difference between LRU and the problem you are trying to deal with looks so obvious to me, I am so worried that you are still messing them up :(
For autonuma, it may be fine to have a lower likelyhood of obtaining an optimum result, because the penalty for getting it wrong is so much lower. Say that LRU evicted the wrong page once every 10,000 evictions. At a disk IO penalty of a million times slower than accessing RAM, that would result in a 100x slowdown. Now say that autonuma places a page in the wrong NUMA node once every 10 times. With a 20% penalty for accessing memory on a remote NUMA node, that results in a 2% slowdown. Even if the NUMA penalty was 100% (2x as slow remote access vs. local), it would only be a 10% slowdown. Why do you think CPU caches can get away with such small associativity sets and simple eviction algorithms? :) -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>