On 03/16/2012 10:40 AM, Peter Zijlstra wrote: At LSF/MM, there was a presentation comparing Peter's NUMA code with Andrea's NUMA code. I believe this is the main reason why Andrea's code performed better in that particular test...
+ if (sched_feat(NUMA_BALANCE_FILTER)) { + /* + * Avoid moving ne's when we create a larger imbalance + * on the other end. + */ + if ((imb->type & NUMA_BALANCE_CPU) && + imb->cpu - cpu_moved < ne_cpu / 2) + goto next; + + /* + * Avoid migrating ne's when we'll know we'll push our + * node over the memory limit. + */ + if (max_mem_load && + imb->mem_load + mem_moved + ne_mem > max_mem_load) + goto next; + }
IIRC the test consisted of a 16GB NUMA system with two 8GB nodes. It was running 3 KVM guests, two guests of 3GB memory each, and one guest of 6GB each. With autonuma, the 6GB guest ended up on one node, and the 3GB guests on the other. With sched numa, each node had a 3GB guest, and part of the 6GB guest. There is a fundamental difference in the balancing between autonuma and sched numa. In sched numa, a process is moved over to the current node only if the current node has space for it. Autonuma, on the other hand, operates more of a a "hostage exchange" policy, where a thread on one node is exchanged with a thread on another node, if it looks like that will reduce the overall number of cross-node NUMA faults in the system. I am not sure how to do a "hostage exchange" algorithm with sched numa, but it would seem like it could be necessary in order for some workloads to converge on a sane configuration. After all, with only about 2GB free on each node, you will never get to move either a 3GB guest, or parts of a 6GB guest... Any ideas? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>