On 04/08/2014 10:17 AM, Christoph Lameter wrote: > Another solution here would be to increase the threshhold so that > 4 socket machines do not enable zone reclaim by default. The larger the > NUMA system is the more memory is off node from the perspective of a > processor and the larger the hit from remote memory. 8 and 16 socket machines aren't common for nonspecialist workloads *now*, but by the time these changes make it into supported distribution kernels, they may very well be. So having zone_reclaim_mode automatically turn itself on if you have more than 8 sockets would still be a booby-trap ("Boss, I dunno. I installed the additional processors and memory performance went to hell!") For zone_reclaim_mode=1 to be useful on standard servers, both of the following need to be true: 1. the user has to have set CPU affinity for their applications; 2. the applications can't need more than one memory bank worth of cache. The thing is, there is *no way* for Linux to know if the above is true. Now, I can certainly imagine non-HPC workloads for which both of the above would be true; for example, I've set up VMware ESX servers where each VM has one socket and one memory bank. However, if the user knows enough to set up socket affinity, they know enough to set zone_reclaim_mode = 1. The default should cover the know-nothing case, not the experienced specialist case. I'd also argue that there's a fundamental false assumption in the entire algorithm of zone_reclaim_mode, because there is no memory bank which is as distant as disk is, ever. However, if it's off by default, then I don't care. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>