Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): > Dave Hansen <dave@xxxxxxxxxxxxxxxxxx> writes: > > > On Wed, 2009-03-18 at 13:03 -0700, Mike Waychison wrote: > >> Polluting the dmesg buffer with messages from common failures (consider > >> a multi-user cluster where checkpoints may or may not succeed) isn't > >> very useful. > > > > Yeah, I've already gotten an earful from Serge and Dan S. about this. :) > > > > Serge suggested that, perhaps, the audit framework could be used. We > > might also use an ftrace buffer if we want to keep a whole ton of > > messages around, too. > > > > dmesg is definitely not workable long-term at all. > > How about having place holder objects in the generated checkpoint. > Then instead of having a failure you have a non-restoreable checkpoint. > But you know which fd, or which mmaped region, or which other thing > is causing the problem and if you want more information you can > look at that resource. > > That gives user space the freedom and scrub out the non-checkpointable > bits and replace them with something like /dev/null so that we can > continue on and restore the checkpoint anyway, if we think our > app can cope with some things going away. > > Eric I like this idea. Subystems which are temporarily entirely unsupported (like sysvipc) would need at least a dummy section in the format wherein we can at least say 'unsupported', otherwise we'll still just get a meaningless -EINVAL. I actually got bitten yesterday by trying to checkpoint a task that wasn't frozen. I forgot v14 had that check, and my failures (a segfault actually) weren't helpful. -serge _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers