On Wed, Aug 06, 2008 at 09:15:46AM -0700, Joseph Ruscio wrote: > > On Aug 5, 2008, at 9:23 AM, Dave Hansen wrote: > >> On Mon, 2008-08-04 at 20:51 -0700, Joseph Ruscio wrote: >>> It might be desirable for the checkpointing implementation to be >>> modular enough that a userspace application or library could select >>> to >>> handle certain resources on their own. Memory is the primary one that >>> comes to mind. >> >> How would you propose making it modular? >> >> -- Dave >> > > > Well it seems to me that the initial focus here is in live migration of > traditional enterprise applications, e.g. databases, app-servers, etc. I > think this is the right focus given how much utility the general > enterprise is finding in capabilities like VMotion. Providing this > mobility to applications without the overhead of traditional VM's would > be very valuable. > > On the other hand I've been primarily focused in checkpointing large- > scale MPI jobs to provide fault tolerance, and that use-case is somewhat > different then the live-migration one. These checkpoints have huge RAM > footprints (in-core checkpointing is not an option), require > coordination across large numbers of servers, some number of open files > on an enormous parallel filesystem, and some scratch files open on the > local disk/ramdisk. They generally have very simple process trees with > one process per core, or one process with a thread for each core. > > To support these kinds of jobs, one would ideally instruct the Container > checkpointer to ignore network resources, dynamically allocated private > memory, and the contents of open files. You'd be relying on the Container > checkpointer to recreate processes, open file descriptors, threads, > thread synchronization primitives, IPC mechanisms (including shm). > > As far as the mechanism is concerned, I'd defer to the more experienced > kernel developers here. I assume that passing a bitmask of flags as an > argument into the checkpoint syscall would be frowned upon, and anyways > redundant, as its unlikely that the mask would change within a container > from checkpoint to checkpoint. If each container is going to have a > CGroup filesystem directory, then we could have a file(s) along the lines > of /proc/sys/kernel/randomize_va_space that turn features off for that > Container. The default settings after Container creation would be a > complete in-kernel checkpoint/migration. Did you think about mechanisms/interfaces making the kernel's checkpointing sub-system and the application/run-time interact to efficiently build the checkpoint image and restart from it? Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers