Quoting Raghu D K (dk.raghu@xxxxxxxxx): > Hi, > > > 1. add '-l logfile' arguments to checkpoint and restart commands, > > to put more debug messages into 'logfile' (which must not yet > > exist) > > 2. add '-v' argument to checkpoint and restart for debugging > > 3. look at /var/log/syslog for lots of error messages, assuming > > you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called) > > set in your kernel > > 4. after doing checkpoint, use 'ckptinfo', which came with the > > user-cr programs, to analyze the checkpoint image > > I have done all of these and tried even the "ckptinfo", which every > reports the error of "unexpected end of file". If 'cpktinfo -ve' is not showing info, then presumably checkpoint failed early. Check the syslog right after checkpoint to start investigating where it stopped. > > I suspect what happened to you, though, is that you left file > > descriptors open. If you look at counterloop/crcounter.c in > > the tests, it does 'for i in (1..100) close(i)'. The problem > > with not doing this is that the program you are checkpointing has > > inherited file descriptors from its parent task, and, at restart, > > it has no way to recreate those. > > I am not testing the sample scripts, I just wrote a sample one as I am > not able to understand > how the Linux CR is supposed to work. > 1. Is it mandatory to have the "mount -tcgroup -o freezer cgroup > /cgroup" mounted ? Yes. And you must freeze the task before checkpointing. > 2. Do we have to launch programs using "nsexec" to be able to > checkpoint and restart them ? They should be in their own namespaces, nsexec is an easy way to accomplish that. If you look at https://code.launchpad.net/~serge-hallyn/+junk/crdemo It has some scripts including 'start_job.sh' which starts an isolated job so that the 'container' (not an lxc container) is checkpointable. -serge _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers