On Thu, Sep 09, 2010 at 12:37:20PM +0200, Louis Rilling wrote: > On 08/09/10 21:06 -0700, Matt Helsley wrote: > > On Wed, Sep 08, 2010 at 08:03:52PM -0500, Serge E. Hallyn wrote: > > > Quoting Matt Helsley (matthltc@xxxxxxxxxx): > > > > On Wed, Sep 08, 2010 at 08:09:31AM -0500, Serge E. Hallyn wrote: > > > > I think it can be split into two composable pieces which may also be > > > > useful independently. > > > > > > > > The first uses the fcntl() interface to add a flag like > > > > O_CLOEXEC. Unlike O_CLOEXEC it marks an fd for preservation during > > > > restart. That way we don't have to specify an fd number and a "source" > > > > to the kernel. Just tell the kernel to keep the fd. The source can > > > > be opened and dup2'd via userspace. This is useful without the > > > > second piece if we want to simply add rather than replace an fd. > > > > > > Can you think of any other use for this flag other than restart? > > > > <joking> > > I can't think of any other uses for O_CLOEXEC. > > </joking> > > > > Seriously though, restart will be used _much_ less often than exec so yes > > it does seem like a waste of a valuable bit and something that wouldn't > > quite belong in an fcntl interface. > > > > However we can try to be a tad clever -- we could (ab|re)use O_CLOEXEC. > > Right now restart closes all file descriptors and pays absolutely > > no attention to O_CLOEXEC. We could reuse O_CLOEXEC to mean O_CLOREST > > too. Have user-cr's restart tool mark all unwanted fds O_CLOEXEC. Any we > > want to keep we do not mark with O_CLOEXEC. > > This would also be useful at checkpoint, to tell sys_checkpoint() which fds > should be ignored, being because it is not supported or because the application > has a better way to deal with it. True. Though unlike restart I don't think we just can (ab|re)use O_CLOEXEC for that purpose. > > > > > > > Here's another idea which I haven't fully thought out yet. > > > > We could introduce the concept of object id substitutions in the image. > > So the image would look like (going from file pos 0 at the top..): > > > > 0 +-------------------------------+ > > | | > > ..... > > +-------------------------------+ > > | <substitute object> | <--- object with id == <substitute id> > > ..... > > +---------------+---------------+ > > | <object id> |<substitute id>| > > +---------------+---------------+ > > ..... > > +---------------+---------------+ > > | <object to ignore> | <-- object with id == <object id> > > ..... > > > > (The above is ignoring the ckpt_hdr fields..) > > > > When we read the image during restart we use the substitute ids to > > create indirect objhash entries. When we encounter an obj id and > > it refers to an indirect entry we first parse the object (ignoring > > errors and dropping references on new objhash insertions), flip > > a bit on the indirect entry (indicating the object has been parsed), > > and then lookup the substitute id and return whatever that resolved to. > > > > We can ignore the new objhash objects by making the objhash have its > > own operation struct. When we're parsing an object that's been > > substituted we just temporarily set the objhash add/lookup operations > > to something suitable for properly dropping references to the new > > object(s). This way we don't have to add checks for this peculiar > > need all over the checkpoint/restart code. Sure it'll be slower... > > If at checkpoint we can take care to ignore files that we know will be > substituted, this should not be that slower. So, would you say typically it's the application developer who knows what to ignore? Are we expecting distros/packagers to be able to set that up? Admins? These specific optimizations seem like they would be a bit fragile unless the application developer is involved. Cheers, -Matt Helsley _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers