Dave Hansen wrote: > On Fri, 2009-03-13 at 14:01 -0700, Linus Torvalds wrote: >> On Fri, 13 Mar 2009, Alexey Dobriyan wrote: >>>> Let's face it, we're not going to _ever_ checkpoint any kind of general >>>> case process. Just TCP makes that fundamentally impossible in the general >>>> case, and there are lots and lots of other cases too (just something as >>>> totally _trivial_ as all the files in the filesystem that don't get rolled >>>> back). >>> What do you mean here? Unlinked files? >> Or modified files, or anything else. "External state" is a pretty damn >> wide net. It's not just TCP sequence numbers and another machine. > > This is precisely the reason that we've focused so hard on containers, > and *didn't* just jump right into checkpoint/restart; we're trying > really hard to constrain the _truly_ external things that a process can > interact with. > > The approach so far has largely been to make things are external to a > process at least *internal* to a container. Network, pid, ipc, and uts > namespaces, for example. An ipc/sem.c semaphore may be external to a > process, so we'll just pick the whole namespace up and checkpoint it > along with the process. > > In the OpenVZ case, they've at least demonstrated that the filesystem > can be moved largely with rsync. Unlinked files need some in-kernel TLC > (or /proc mangling) but it isn't *that* bad. And in the Zap we have successfully used a log-based filesystem (specifically NILFS) to continuously snapshot the file-system atomically with taking a checkpoint, so it can easily branch off past checkpoints, including the file system. And unlinked files can be (inefficiently) handled by saving their full contents with the checkpoint image - it's not a big toll on many apps (if you exclude Wine and UML...). At least that's a start. > > We can also make the fs problem much easier by using things like dm or > btrfs snapshotting of the block device, or restricting to where on a fs > a container is allowed to write with stuff like r/o bind mounts. (or NILFS) So we argue that the FS snapshotting is related, but orthogonal in terms of implementation to c/r. Oren. -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html