On 03/16/2012 10:40 AM, Peter Zijlstra wrote:
+/* + * Assumes symmetric NUMA -- that is, each node is of equal size. + */ +static void set_max_mem_load(unsigned long load) +{ + unsigned long old_load; + + spin_lock(&max_mem_load.lock); + old_load = max_mem_load.load; + if (!old_load) + old_load = load; + max_mem_load.load = (old_load + load) >> 1; + spin_unlock(&max_mem_load.lock); +}
The above in your patch kind of conflicts with this bit from patch 6/26: + /* + * Migration allocates pages in the highest zone. If we cannot + * do so then migration (at least from node to node) is not + * possible. + */ + if (vma->vm_file && + gfp_zone(mapping_gfp_mask(vma->vm_file->f_mapping)) + < policy_zone) + return 0; Looking at how the memory load code is used, I wonder if it would make sense to count "zone size - free - inactive file" pages instead?
+ /* + * Avoid migrating ne's when we'll know we'll push our + * node over the memory limit. + */ + if (max_mem_load && + imb->mem_load + mem_moved + ne_mem > max_mem_load) + goto next;
+static void numa_balance(struct node_queue *this_nq) +{ + struct numa_imbalance imb; + int busiest; + + busiest = find_busiest_node(this_nq->node, &imb); + if (busiest == -1) + return; + + if (imb.cpu <= 0 && imb.mem <= 0) + return; + + move_processes(nq_of(busiest), this_nq, &imb); +}
You asked how and why Andrea's algorithm converges. After looking at both patch sets for a while, and asking for clarification, I think I can see how his code converges. It is not yet clear to me how and why your code converges. I see some dual bin packing (CPU & memory) heuristics, but it is not at all clear to me how they interact, especially when workloads are going active and idle on a regular basis. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>