On Tue, 9 Oct 2012, Tang Chen wrote: > > > Eek, the nid shouldn't be -1 yet, though, for cpu hotplug since this > > > should be called at CPU_DYING level and migrate_tasks() still sees a valid > > > cpu. > > As Wen said below, nid is now set to -1 when cpu is hotremoved. > I reproduce this problem in this situation: > > all cpus are online, and hot remove a system board directorily, without > offlining any cpu. > > As a result, the removed cpu's nid is set to -1, and this causes > problems. > Let's add Andrew to the cc list then, because I'm nacking cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in the -mm tree for this reason. We can only clear a cpu-to-node mapping when the cpu is completely offline, not before or during the CPU_DYING stage. Kernel code, such as the sched code that you are now trying to "fix", depends on this mapping to work correctly; obviously no audit was done of cpu hotplug code depending on it before the patch was proposed. I say "fix" because even this workaround isn't a good solution since it would be much better to pick another cpu on the same node as the offlining cpu for the runqueue before falling back to the set of all allowed nodes. We lose all NUMA affinity information with that patch. There's no reason why we shouldn't know the node of a cpu that is being offlined. So nack to cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch. After it's removed because it's buggy, this "fix" will no longer be necessary. -- To unsubscribe from this list: send the line "unsubscribe linux-numa" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html