* Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > Now, we've gone in blind before - most notably on the > containers/cgroups/namespaces stuff. That hail mary pass worked out > acceptably, I think. Maybe we got lucky. I thought that > net-namespaces in particular would never get there, but it did. > > That was a very large and quite long-term-important user-visible > feature. > > checkpoint/restart/migration is also a long-term-...-feature. But if > at all possible I do think that we should go into it with our eyes a > little less shut. IMO, s/.../important/ More important than containers in fact. Being able to detach all software state from the hw state and being able to reattach it: 1) at a later point in time, or 2) in a different piece of hardware, or 3) [future] in a different kernel ... is powerful stuff on a very conceptual level IMO. The only reason we dont have it in every OS is not because it's not desired and not wanted, but because it's very, very hard to do it on a wide scale. But people would love it even if it adds (some) overhead. This kind of featureset is actually the main motivator for virtualization. If the native kernel was able to do checkpointing we'd have not only near-zero-cost virtualization done at the right abstraction level (when combined with containers/control-groups), but we'd also have a few future feature items like: 1) Kernel upgrades done intelligently: transparent reboot into an upgraded kernel. 2) Downgrade-on-regressions done sanely: transparent downgrade+reboot to a known-working kernel. (as long as the regression is app misbehavior or a performance problem - not a kernel crash. Most regressions on kernel upgrades are not actual crashes or data corruption but functional and performance regressions - i.e. it's safely checkpointable and downgradeable.) 3) Hibernation done intelligently: checkpoint everything, turn off system. Turn on system, restore everything from the checkpoint. 4) Backups done intelligently: full "backups" of long-running computational jobs, maybe even of complex things like databases or desktop sessions. 5) Remote debugging done intelligently: got a crashed session? Checkpoint the whole app in its anomalous state and upload the image (as long as you can trust the developer with that image and with the filesystem state that goes with it). I dont see many long-term dragons here. The kernel is obviously always able to do near-zero-overhead checkpointing: it knows about all its own data structures, can enumerate them and knows how they map to user-space objects. The rest is performance considerations: do we want to embedd checkpointing helpers in certain runtime codepaths, to make checkpointing faster? But if that is undesirable (serialization, etc.), we can always fall back to the dumbest, zero-overhead methods. There is _one_ interim runtime cost: the "can we checkpoint or not" decision that the kernel has to make while the feature is not complete. That, if this feature takes off, is just a short-term worry - as basically everything will be checkpointable in the long run. In any case, by designing checkpointing to reuse the existing LSM callbacks, we'd hit multiple birds with the same stone. (One of which is the constant complaints about the runtime costs of the LSM callbacks - with checkpointing we get an independent, non-security user of the facility which is a nice touch.) So all things considered it does not look like a bad deal to me - but i might be missing something nasty. Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html