On Wed, Aug 06, 2008 at 08:41:10AM -0700, Joseph Ruscio wrote: > > On Aug 5, 2008, at 9:20 AM, Oren Laadan wrote: >> Eh... and, yes, live migration :) > > User-space live migration of a "batch" process e.g. one taking place in > an MPI job is quite trivial. User-space live migration of something like > a database is not that hard assuming you have a cooperative load > balancer or proxy on the front end. Hm, this means modifying the MPI run-time, right? Especially the ones relying on daemons on each node (like LAM implementation, and MPI2 specification IIRC). Anyway, this is probably not an issue, since most high-end HPC systems come with their own customized MPI implementation. > > I'm not advocating for implementing this in user-space. I am in complete > agreement that this effort should result in code that completely > checkpoints a Container in the kernel. My question was whether there are > situations where it would be advantageous for user-space to have the > option of instructing/hinting the kernel to ignore certain resources that > it would handle itself. Most of the use-cases I'm thinking of come from > the different styles of implementations I've seen in the HPC space, where > our implementation (and a lot of others) are focused. > > MPI codes require coordination between all the different processes > taking part to ensure that the checkpoints are globally consistent. MPI > implementations that run on hardware such as Infiniband would most > likely want the container checkpointing to ignore all of the pinned > memory associated with the RDMA operations so that the coordination and > recreation of MPI communicator state could be handled in user-space. When > working with inflexible process checkpointers, MPI coordination routines > often must completely teardown all communicator state prior to invoking > the checkpoint, and then recreate all the communicators after the > checkpoint. On very large scale jobs, this is expensive. > > As another example HPC applications can create local scratch files of > several GB in /tmp. It may not be necessary to migrate these files, but > if user-space has no way to mark a particular file, "local files", or > files in general as being ignored, then we'll have to copy these during a > migration or a checkpoint. Definitely agree with you here. This is the kind of use-case we will study in Kerrighed. (Actually the project is centered on supporting a petaflopic application, with help from Kerrighed to tolerate failures). > > I don't suppose anyone is attending Linuxworld in San Francisco this > week? I'd be more then happy to grab a coffee and talk about some of > this. I stopped by the OpenVZ booth but none of the devs are around. Not me, sorry :) However, whichever requirement you can describe is interesting for us. They can surely help designing a most useful checkpoint/restart mechanism. Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers