Quoting Matt Helsley (matthltc@xxxxxxxxxx): > On Wed, Sep 08, 2010 at 08:09:31AM -0500, Serge E. Hallyn wrote: > > Quoting Matthieu Fertré (matthieu.fertre@xxxxxxxxxxx): > > > Hi, > > > > > > Here is a proposal for a C/R related feature already developed in > > > Kerrighed: file substitution at restart. > > > > > > The goal of this mail is to start a discussion about adding such feature > > > to Linux cr. Comments are welcome! > > > > Yup, AFAIK metacluster and zap do this too. I don't think there is > > any question about whether we want to support this, but rather > > what the user-kernel API should look like. Perhaps the easiest > > "API" is to have the userspace program rewrite the checkpoint image, > > but that probably isn't quite as simple as just substituting #s in > > the image, bc we'll have to also find the place where the source of > > the original fd was specified and tweak that. > > > > I assume this is one of the things Oren would have 'cradvise()' > > do, and at this point that sounds nice to me - might be worth > > seeing how the community reacts. Sentiments on such things change, > > after all. > > > > Have there been any other suggestions? > > I think it can be split into two composable pieces which may also be > useful independently. > > The first uses the fcntl() interface to add a flag like > O_CLOEXEC. Unlike O_CLOEXEC it marks an fd for preservation during > restart. That way we don't have to specify an fd number and a "source" > to the kernel. Just tell the kernel to keep the fd. The source can > be opened and dup2'd via userspace. This is useful without the > second piece if we want to simply add rather than replace an fd. Can you think of any other use for this flag other than restart? If so, then having a fcntl flag (and later madvise) makes sense. But if we're going to add options to various different APIS which really are all only useful for c/r, then maybe a single new cr_advise() really does make sense. The alternative may be more popular at first but would IMO turn into a disaster. > Then a separate interface/tool is needed to ignore/delete > the extra CKPT_OBJ_FILE in the checkpoint image. That's the difficult > part. It's difficult because depending on the open file the portions of > the image to ignore/delete can vary wildly. For instance, imagine if an > epoll fd was being ignored. It starts much like a generic file but there > is an image header related to it that isn't a CKPT_OBJ_*. If we fail to > delete/ignore this section prior to parsing then it completely breaks > the parsing. Yup, that is precisely what stopped me when I tried to do this 6 months or so ago just for stdin/stdout/stderr. > In contrast, CKPT_OBJ_* do not break the parsing since > they aren't expected in a strict order -- the parser is capable of > parsing them at any time and the only order constraint on them is that > they appear in the image before they are referenced. > This piece is also useful by itself if we want to ignore/delete an fd > rather than substitute it. Are you working on any of this? _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers