On Mon, Mar 07, 2011 at 01:35:03PM -0600, Serge E. Hallyn wrote: > Hey, > > I'm using ckpt-v23-rc1-pids branches of both linux-cr and user-cr (with > a few trivial fixes, packaged at > https://launchpad.net/~appcr/+archive/ppa). Checkpoint is being done > using > /usr/bin/appcheckpoint -N -l $vardir/log $pid > $vardir/ckpt > and restart with > /usr/bin/apprestart -vd --mntns --mount-pty --pids -l $vardir/rlog < $vardir/ckpt > > The resulting checkpoint file is attached as 'ckpt', the result > of 'ckptinfo -vp $ckpt' is in ckpt-ckptinfo, and the kernel log > (partial) from the checkpoint operation is in kernel.log. When I > restart, it fails with console output shown in restart.console.out. Hi Serge, Unfortunately ckptinfo is broken. My most recent series of 18 patches to user-cr fixes it. I've attached the output using an earlier version of the kernel headers than found in the -pids branch. Consequently there are a couple "UNKOWN" hdrs but otherwise it should provide you with much more useful information since the broken version fails to iterate over the entire checkpoint file. <snip> > <1012>====== TASKS > <1012> [ 0] pid 0( 1) (tgid 1) ppid 0( 0) (pgid -4097) sid -1(-4097) creator 0( 0) > <1012> [ 1] pid 0( 2) (tgid 2) ppid 0( 1) (pgid 2) sid 0( 2) creator 0( 0) prev 301 > <1012> [ 2] pid 0( 3) (tgid 3) ppid 0( 1) (pgid -4097) sid -1(-4097) creator 301( 0) S > <1012> [ 3] pid 0( 4) (tgid 4) ppid 0( 1) (pgid 4) sid 0( 4) creator 0( 0) next 301 prev 0 > <1012> [ 4] pid 0( 5) (tgid 5) ppid 0( 4) (pgid 5) sid 0( 5) creator 0( 0) next 0 prev 0 > <1012> [ 5] pid 0( 6) (tgid 6) ppid 0( 5) (pgid 5) sid 0( 5) creator 0( 0) next 0 placeholder 301 > <1012> [ 6] pid 301( 7) (tgid -1) ppid 0( 1) (pgid -1) sid -1(-4097) creator 0( 0) next 0 prev 0 D > <1012>............ > <1012>new pidns without init > <1012>forking coordinator in new pidns > <1>fork child vpid 0 flags 0x1 > <1>task 0 forking with flags 11 numpids 1 > <1>task 0 pids: 0 > <1>... > <1>forked child vpid 2 (asked 0) > failed to create specific pid with eclone > <1013>====== PIDS ARRAY > <1013>[0] pid 0 depth 0 > <1013>[1] pid 0 depth 0 > <1013>[2] pid 0 depth 0 > <1013>[3] pid 0 depth 0 > <1013>[4] pid 0 depth 0 > <1013>[5] pid 0 depth 0 > <1013>............ > <1012>Coordinator failed to report status > <1012>SIGCHLD: child not ready > <1012>SIGCHLD: already collected > <1012>c/r failed ? > restart: Input/output error > Restart failed > root@cr-natty-i386:/home/serge# I've been having trouble with earlier versions of user-cr than the -pids branch. The symptoms were that sys_restart() failed to read a single byte of the checkpoint image -- despite the stdio output of the restart userspace program above. To see if you're experiencing the same problem, look at dmesg and see that it dies at "pos 0" in the checkpoint file with "Expecting type 1" -- that indicates it hasn't even read the first hdr! It seems like the "fork feeders" aren't outputting anything and/or the pipes are hooked up wrong in user-cr's restart, because I get EPIPE right at the beginning. This was masked by some troubles with ckpt_err() which I still haven't been able to figure out -- the "error" reported in dmesg above is -512 but really should be -32 (EPIPE). You rewrote that code, didn't you? ;) Anyhow, today I managed to bisect my user-cr troubles down to: 5b97422c4c1342a128df508cda7c4639ecb24a36 Revert was not clean and the resolved conflicts on top of the revert did not seem to fix the problem, soo... :/ all I can recommend at the moment is sticking to the earlier branches. Hope that helps. Again, sorry for the delay. Cheers, -Matt Helsley
Attachment:
serge-ckpt.img.info
Description: ckptinfo output
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers