On Wed, 10 Oct 2012, Peter Zijlstra wrote: > > If cpu_to_node() always returns a valid node id even if all cpus on the > > node are offline, then the cpumask_of_node() implementation, which the > > sched code is using, should either return an empty cpumask (if > > node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in > > behavior here occurred because > > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't > > return a valid node id and forces it to return -1 so a kzalloc_node(..., > > -1) fallsback to allocate anywhere. > > I think that's broken semantics.. so far the entire cpu<->node mapping > was invariant during hotplug. Changing that is going to be _very_ > interesting and cannot be done lightly. > > Because as I said, per-cpu memory is preserved over hotplug, and that > has numa affinity. > > So for now, let me NACK that patch. You cannot go change stuff like > that. > Agreed, that makes the nack-count up to 2 now. Andrew, please remove cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch from -mm. > > But if you only need cpu_to_node() when waking up to find a runnable cpu > > for this NUMA information, then I think you can just change the > > kzalloc_node() in alloc_{fair,rt}_sched_group() to do > > kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE). > > That's a confusing statement, the wakeup stuff and the > alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites > might need fixing if we're going to go ahead with this. > The alternative is for node hot-remove to do an iteration of all possible cpus and set cpu-to-node to be NUMA_NO_NODE for all offlined cpus that map to that node. If cpu_online() is true for any of those cpus, then obviously it can't be offlined. We want to do this so that kzalloc_node(..., cpu_to_node()) fallsback to allocating from any node, which it should, and because a subsequent node hot-add event that reuses the same node id may not be the same node. -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html