Oren Laadan wrote: > > Dan Smith wrote: >> SH> Well it forces restart to go through the established userspace >> SH> API's when creating resources (in this case, tasks and namespaces) >> SH> which means any existing security guarantees are leveraged. >> >> That's a very valid point. However, it still seems unbalanced to make >> checkpoint a completely in-kernel process and restart an odd mix of >> the two with potentially more confusing semantics and requirements. >> > > There are other reasons to allow restart to be not fully symmetric > with respect to checkpoint. For example, if you have a smart(er) user > space application that wants to provide the restart some of the resources > pre-constructed, allowing much flexibility (already requested by people) > for the restart provdure (E.g., when doing distributed checkpoint, or > when restarting a special device whose). yes the arguments you have for restart are also valid for checkpoint in a distributed checkpoint scenario. you want to be able to easily and rapidly abort the checkpoint of a job when one node (among thousands) fails for some reason. a batch manager would use a signal. you also want fine grain synchronization for network, when migrating only one node. We've had to solve the above issues on a large HPC project and there are plenty of other good reasons to have a mix of kernel and user space for restart and for checkpoint. C. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers