On 05/18/2012 06:42 AM, tip-bot for Peter Zijlstra wrote:
Now that we have a NUMA process scheduler, provide a syscall interface for finer granularity NUMA balancing. In particular this allows setting up NUMA groups of threads and vmas within a process. For this we introduce two new syscalls: sys_numa_tbind(int tig, int ng_id, unsigned long flags); Bind a thread to a numa group, query its binding or create a new group: sys_numa_tbind(tid, -1, 0); // create new group, return new ng_id sys_numa_tbind(tid, -2, 0); // returns existing ng_id sys_numa_tbind(tid, ng_id, 0); // set ng_id
I am not convinced this is the right way forward. While this may work well for programs written in languages with pointers, and for virtual machines, I do not see how eg. a JVM could provide useful hints to the kernel, because the Java program running on top has no idea about the memory addresses of its objects, and the Java language has no way to hint which thread will be the predominant user of an object. I like your code for handling smaller processes in NUMA systems, but we do need to have a serious discussion on how to handle processes that do not fit in one node. The more I think about it, the more Andrea's code looks like it might be the more flexible way forward. Another topic to discuss is whether we want lazy migrate-on-fault, or if we want to keep the program spend its time running, using another (idle) core to do the migration in the background. -- All rights reversed -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html