On Mon, 2009-05-04 at 18:02 +0800, Miao Xie wrote: > on 2009-5-4 11:06 Lee Schermerhorn wrote: > > Against: 2.6.20-rc3-mmotm-090428-1631 > > > > Since cpusetmm-update-tasks-mems_allowed-in-time.patch removed the call outs > > to cpuset_update_task_memory_state(), tasks in the top cpuset don't get their > > mems_allowed updated to just nodes with memory. cpuset_init()initializes > > the top cpuset's mems_allowed with nodes_setall() and > > cpuset_init_current_mems_allowed() and kernel_init() initialize the kernel > > initialization tasks' mems_allowed to all possible nodes. Tasks in the top > > cpuset that inherit the init task's mems_allowed without modification will > > have all possible nodes set. This can be seen by examining the Mems_allowed > > field in /proc/<pid>/status in such a task. > > > > "numactl --interleave=all" also initializes the interleave node mask to all > > ones, depending on the masking with mems_allowed to eliminate non-existent > > nodes and nodes without memory. As this was not happening, the interleave > > policy was attempting to dereference non-existent nodes. > > > > This patch modifies the nodes_setall() calls in two cpuset init functions and > > the initialization of task #1's mems_allowed to use node_states[N_HIGH_MEMORY]. > > This mask has been initialized to contain only existing nodes with memory by > > the time the respective init functions are called. > > You forget to modify the cpuset_attach(). This function will initialize the > mems_allowed of the task which is being moved into the top cpuset by node_possible_map. Thanks, I'll look at that. I had tested moving tasks between cpusets and thought that it was working, but I'd been looking at this for a while and could have been imagining it. I'll look for all uses of node_possible_map, etc. > > Beside that, if you use node_states[N_HIGH_MEMORY] to initialize the mems_allowed > of the tasks in the top cpuset, you must update it when adding a node with memory into > the system. So you also must modify cpuset_track_online_nodes(). So, we'll need to walk the tasks in the top-level cpuset and update their mems_allowed on node on/off-line. I'd have thought we already did that, but must admit I didn't check. I'll take a look at how cpuset_track_online_nodes() interacts with mems_allowed, ... Lee -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html