On Wed, 2010-08-18 at 23:21 +0800, Wu Fengguang wrote: > Andi, Christoph and Lee: > > This looks like an "unbalanced NUMA memory usage leading to premature > swapping" problem. What is the value of the vm.zone_reclaim_mode sysctl? If it is !0, the system will go into zone reclaim before allocating off-node pages. However, it shouldn't "swap" in this case unless (zone_reclaim_mode & 4) != 0. And even then, zone reclaim should only reclaim file pages, not anon. In theory... Note: zone_reclaim_mode will be enabled by default [= 1] if the SLIT contains any distances > 2.0 [20]. Check SLIT values via 'numactl --hardware'. Lee > > Thanks, > Fengguang > > On Wed, Aug 18, 2010 at 10:46:59PM +0800, Chris Webb wrote: > > Wu Fengguang <fengguang.wu@xxxxxxxxx> writes: > > > > > Did you enable any NUMA policy? That could start swapping even if > > > there are lots of free pages in some nodes. > > > > Hi. Thanks for the follow-up. We haven't done any configuration or tuning of > > NUMA behaviour, but NUMA support is definitely compiled into the kernel: > > > > # zgrep NUMA /proc/config.gz > > CONFIG_NUMA_IRQ_DESC=y > > CONFIG_NUMA=y > > CONFIG_K8_NUMA=y > > CONFIG_X86_64_ACPI_NUMA=y > > # CONFIG_NUMA_EMU is not set > > CONFIG_ACPI_NUMA=y > > # grep -i numa /var/log/dmesg.boot > > NUMe: Allocated memnodemap from b000 - 1b540 > > NUMA: Using 20 for the hash shift. > > > > > Are your free pages equally distributed over the nodes? Or limited to > > > some of the nodes? Try this command: > > > > > > grep MemFree /sys/devices/system/node/node*/meminfo > > > > My worst-case machines current have swap completely turned off to make them > > usable for clients, but I have one machine which is about 3GB into swap with > > 8GB of buffers and 3GB free. This shows > > > > # grep MemFree /sys/devices/system/node/node*/meminfo > > /sys/devices/system/node/node0/meminfo:Node 0 MemFree: 954500 kB > > /sys/devices/system/node/node1/meminfo:Node 1 MemFree: 2374528 kB > > > > I could definitely imagine that one of the nodes could have dipped down to > > zero in the past. I'll try enabling swap on one of our machines with the bad > > problem late tonight and repeat the experiment. The node meminfo on this box > > currently looks like > > > > # grep MemFree /sys/devices/system/node/node*/meminfo > > /sys/devices/system/node/node0/meminfo:Node 0 MemFree: 82732 kB > > /sys/devices/system/node/node1/meminfo:Node 1 MemFree: 1723896 kB > > > > Best wishes, > > > > Chris. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxxx For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>