On Mon, Aug 04, 2008 at 08:51:37PM -0700, Joseph Ruscio wrote: > As somewhat of a tangent to this discussion, I've been giving some > thought to the general strategy we talked about during the summit. The > checkpointing solution we built at Evergrid sits completely in userspace > and is soley focused on checkpointing parallel codes (e.g. MPI). That > approach required us to virtualize a whole slew of resources (e.g. PIDs) > that will be far better supported in the kernel through this effort. On > the other hand, there isn't anything inherent to checkpointing the memory > in a process that requires it to be in a kernel. During a restart, you > can map and load the memory from the checkpoint file in userspace as > easily as in the kernel. Since the cost of checkpointing HPC codes is Hmm, for unusual mappings this may be not so easy to reproduce from userspace if binaries are statically linked. I agree that with dynamically linked applications, LD_PRELOAD allows one to record the actual memory mappings and restore them at restart. > fairly dominated by checkpointing their large memory footprints, memory > checkpointing is an area of ongoing research with many different > solutions. > > It might be desirable for the checkpointing implementation to be modular > enough that a userspace application or library could select to handle > certain resources on their own. Memory is the primary one that comes to > mind. I definitely agree with you about this flexibility. Actually in Kerrighed, during the next 3 years, we are going to study an API for collaborative checkpoint/restart between kernel and userspace, in order to allow such HPC apps to checkpoint huge memory efficiently (eg. when reaching states where saving small parts is enough), or to rebuild their data from partial/older states. I hope that this study will bring useful ideas that could be applied to containers as well. Thanks, Louis -- Dr Louis Rilling Kerlabs - IRISA Skype: louis.rilling Campus Universitaire de Beaulieu Phone: (+33|0) 2 99 84 71 52 Avenue du General Leclerc Fax: (+33|0) 2 99 84 71 71 35042 Rennes CEDEX - France http://www.kerlabs.com/ _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers