Oren Laadan wrote: > > Serge E. Hallyn wrote: >> Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): >>> Serge E. Hallyn wrote: >>>> Quoting Oren Laadan (orenl@xxxxxxxxxxxxxxx): >>>>> Sukadev Bhattiprolu wrote: >>>>>> From: Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx> >>>>>> Date: Fri, 13 Mar 2009 17:25:42 -0700 >>>>>> Subject: [PATCH 5/6] Define and use proc_pid_checkpointable() >>>>>> >>>>>> Create a proc file, /proc/pid/checkpointable, which shows '1' if >>>>>> task is checkpointable and '0' if it is not. >>>>>> >>>>>> To determine whether a task is checkpointable, the handler for this >>>>>> new proc file, shares the same code with sys_checkpoint(). >>>> Hey Oren, >>>> >>>> 3 counter-points: >>>> >>>>> I still don't understand why we would like to do it this way. >>>>> >>>>> First, it makes little sense to do it per-task, because we are supposed >>>>> to checkpoint an entire container. >>>> Yes we need per-container info too. Actually, per-checkpoint-job-init, >>>> so if we send pids in for that, it should return false if we send in the >>>> pid of a task which isn't a proper checkpoint-job-init. >>>> >>>> But we also want the info per-task, for debugging info. >>>> >>> My suggestions works for this two: we add a flag CR_CTX_DRYRUN; a task >>> can ask to checkpoint itself, or another task, with CR_CTX_DRYRUN and >>> the checkpoint code runs without actual effect. (If we don't want to >>> expose the actual flag to userspace, then we simply use it in an >>> implementation of a /proc/PID/checkpointable operation). >> Hmm, so if we pass in CR_CTX_DRYRUN, then the fd can point to a file >> wherein to store a text represenation of the reason? >> >> Dave will probably hate it, but it could be worse... > > Either that. > > Or continue using a debugfs interface, except that the implementation > will go through an internal interface as I suggest. > > Or spit the reason on the kernel console so the user can check dmesg > for it (like when 'modprobe' fails). It'd be nice if the error message could be gotten directly from the call. Would something like a new packet in the output stream (cr_hdr->type == CR_HDR_FAILURE) with something descriptive in the body of the packet make sense? That could then be scanned for when sys_checkpoint fails.. Polluting the dmesg buffer with messages from common failures (consider a multi-user cluster where checkpoints may or may not succeed) isn't very useful. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers