Hi Paul, On Mon, Feb 06, 2012 at 05:15:08AM -0800, Paul Turner wrote: > I don't see it proposed as a topic yet (unless I missed it) but I spoke with > Peter briefly and I think this would be a good opportunity in particular to > discuss NUMA-aware load-balancing. Currently, we only try to solve the 1-d > problem of optimizing for weight; but there's recently been interest from > several parties in improving this. Issues involves proactively accounting for > the distribution of current allocations, determining when to initiate reactive > migration (or when not to move tasks!), and the associated grouping semantics. Thanks for the topic proposal. We actually planned to have two topics slots for the NUMA aware balancing. It was also proposed as a topic by Andi and Peter but they didn't CC linux-mm on it so it wasn't visible yet. BTW, not sure if you noticed I also got AutoNUMA to work pretty well over the last month. I posted the results and testcases used for the benchmarks on linux-kernel. I'm still cleaning it up. I've yet to do full evaluation in virt mixed environment though (for host it seems to work as good as hard bindings so far, especially in the testcases I developed to test it which starts from worst case memory placement scenarios). It tracks both mm<->processes affinity and page<->thread affinity and balances processes vs processes and threads vs threads in the scheduler (while still maxing out all idle CPUs of course). Shared memory accessed by different nodes is handled with some heuristic. By default I only allow CPU_IDLE/NEWIDLE load balances across the nodes as that performs best globally :) but tends to partition the NUMA system more so it's less fair. I'll add a tweak to allow load_balances also for non idle CPUs across nodes (it's a few liner change to switch between the two modes), but even in that case it always tries to find an affine task first (double the number of passes). Overall there's an huge room for improvement in the scheduler area, and the way I hooked into the scheduler to drive it in function of the NUMA statistical info, is quite self contained but probably not the best for long term (though at runtime I shouldn't matter, so I leave it for a second stage cleanup if these algorithms will be proven to be worthwhile). Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>