On Fri, 4 May 2012, Peter Zijlstra wrote: > That said, the whole suspend/resume 'problem' does seem worth fixing and > is a very special case where we absolutely know we're going to get back > in the state we are in and userspace isn't actually running. So ideally > we'd go with the bhat's patch that skips the sched_domain rebuilds > entirely +- some bug-fixes ;-). Just as an interesting side comment... The USB subsystem faced this same problem years ago. The question was: When a USB device (especially a mass-storage device) is unplugged and then reconnected, is the new device instance the same as the old one? Linus stepped in and firmly assured us that it was not. That's very much like the situation you're describing: If CPU 4 is hot-unplugged and then a new CPU appears in slot 4, is it the same CPU as before (and does it therefore belong to the same cpusets as before)? But this led to problems during suspend, because not all systems could maintain bus connectivity while the system was asleep, and almost none can during hibernation. As a result, mounted filesystems would become unavailable after resume even though the USB storage device had been plugged in the whole time. To the kernel, it appeared that the device had been unplugged during suspend and then replugged during resume. We ended up adopting a special-purpose solution just to handle that case. It's described in Documentation/usb/persist.txt if you want the full details. In brief, when the system resumes it checks to see if a device appears to be present at the same port where a device used to be. If it is, and if its descriptors match the values remembered for the former device, then we accept the new device as being the same as the old one, even though the hardware indicates that the connection was not maintained during the system sleep. >From my point of view, this suggests that CPU hot-unplug is not quite the right tool to use during suspend. The CPU doesn't actually go away; it merely becomes unusable for a while. In other words, this approach applies an incorrect abstraction. What's really needed is something a little different: a way to avoid running any tasks on that CPU while not removing it from the system. If this means some tasks can no longer run on any CPUs, so be it -- this happens only during suspend, after all. Then during resume, when the CPU is brought back up, tasks are allowed to run on it again. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html