Hi On 2010/03/20, at 0:34, Oren Laadan wrote: > > > Jiro SEKIBA wrote: >> Hi, >> On 2010/03/18, at 5:55, Serge E. Hallyn wrote: >>> Quoting Jiro SEKIBA (jir@xxxxxxxxxxxxxxxxx): >>>> Hi, >>>> >>>> Thank you for prompt reply! >>>> Sorry that I didn't post to containers@xxxxxxxxxxxxxxxxxxxxxxxxxxx >>>> >>>> On 2010/03/16, at 7:55, Oren Laadan wrote: >>>> >>>>> Hi, >>>>> >>>>> Thanks for taking the time to evaluate c/r. You may want to also >>>>> try the latest, which is (as of now) ckpt-v20-rc2. >>>> Yeah, I'll eventually try to keep up with the latest, >>>> but I just want to try the one you think it's stable first anyway. >>>> >>>>> In the future, please CC the containers mailing list for issues >>>>> related to c/r, at "containers@xxxxxxxxxxxxxxxxxxxxxxxxxx". >>>>> >>>>> Jiro SEKIBA wrote: >>>>>> Hi, >>>>>> I'm trying to evaluate external checkpoint/restart with cr-v19 kernel. >>>>>> However, when I restart, I got "Killed" message in stdout. >>>>>> Do you have any tips or clue that are not in >>>>>> Documentation/checkpoint/usage.txt ? >>>>>> I'm using kernel pulled from >>>>>> git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git . >>>>>> checkout tag named "ckpt-v19". Base distro is ubuntu 9.10. >>>>>> I ran self checkpioint/restart sample program in Documentation/checkpint. >>>>>> It works as written in usage.txt. >>>>>> However, I can not make external checkpint/restart work properly. >>>>>> I made a simple test program bellow and create checkpoint externally using >>>>>> the program in Documentation/checkpoint/, it looks checkpoint file is >>>>>> created properly. >>>>>> However, when I ran self_restart < ckpt.image, I got "Killed" message. >>>>> If you take an external checkpoint, then you need to match it >>>>> with an external restart, as opposed to self_restart. >>>>> >>>>> Otherwise, restarting with self_restart from a checkpoint that is >>>>> not a self-checkpoint can yield unexpected results. >>>>> >>>>> Since you don't mention in your post, I don't know if you are using >>>>> the tools from user-cr. If not, then you should use 'checkpoint' and >>>>> 'restart' tools from there. It is available from: >>>>> git://git.ncl.cs.columbia.edu/pub/git/user-cr.git >>>>> (use the same branch as the one you used to linux-cr). >>>>> >>>>> Once you have the tools compiled, and you checkpoint with the >>>>> 'checkpoint' utility from there, you can restart with: >>>>> restart -v < ckpt.image >>>>> >>>> Thank you for the information. >>>> Actually I was trying to create checkpoint in Document/checkpints. >>>> >>>> Now, I tried with user-cr, compiled binary in the same tag (ckpt-v19). >>>> Creating checkpoint looks OK and restart -v shows it Success. nice! >>>> However, the contents in /tmp/test.out never get further, >>>> it remains same as when created checkpoint. >>>> >>>> I tried "./restart -F /cgroup/0 -v --no-pidns < ckpt.image", got Success. >>>> cat /cgroup/0/tasks tells that there is a process. >>>> ps shows ./test. So, it looks restarting. >>>> >>>> # ps axuww |grep $(cat /cgroup/0/tasks ) >>>> root 7231 0.1 0.0 1588 64 pts/0 D 16:57 0:00 ./test >>>> root 7238 0.0 0.1 2716 660 pts/1 R+ 16:57 0:00 grep 7231 >>>> >>>> under the /proc, one file descriptor opened, and it is /tmp/test.out >>>> >>>> # ls -l /proc/$(cat /cgroup/0/tasks)/fd >>>> total 0 >>>> lrwx------ 1 root root 64 Mar 16 16:58 0 -> /tmp/test.out >>>> >>>> Nhh, it's close.. >>>> >>>> I found that when I mount cgroup with -o freezer, self_checkpoint won't work. >>>> It worked even I didn't mount the cgroup. >>>> Is it what you expect? >>> No, it is not. Can you tell us more about exactly how it fails? >>> >> OK, I've checked differences of dmesg when self_restart does well and doesn't. >> When it goes well, the filename is /tmp/cr-self.out >> [ 401.522556] [2307:2307:c/r:ckpt_read_fname:571] read filename '/tmp/cr-self.out' >> [ 401.522558] [2307:2307:c/r:restore_open_fname:594] fname '/tmp/cr-self.out' flags 0x2 > > This means that restart wants to re-open the file /tmp/cr-self.out. >> However, when the contents of file remains, filename is /tmp/cr-self.out.org, >> which is , of course, the one of original file binding to the original process. >> [ 1088.414250] [2951:2951:c/r:ckpt_read_fname:571] read filename '/tmp/cr-self.out.orig' >> [ 1088.414253] [2951:2951:c/r:restore_open_fname:594] fname '/tmp/cr-self.out.orig' flags 0x2 > > This means that restart wants to re-open the file /tmp/cr-self.out.org. > > Could it be that these two restart attempts use two distinct image files > as input ? > It's not, I ran same script that run, self_checkpoint, sleep, mv/cp file, and self_restasrt. And sometime it's OK (means, cr.-self.out glows after restart), sometime it's not. > The first one seems to correspond to something like: > 1) start the test, 2) checkpoint, 3) mv file and cp file, 4) restart > > The second one seems to correspond to something like: > 1) start the test, 2) mv file and ctp file, 3) checkpoint, 4) restart > > What is the actual error reported when it doesn't work ? (from restart > and from the kernel log) > OK, that makes sense. I tried following shell script. If I sleep 4 instead of 3, I got expected result 100% so far --------- self_checkpoint.sh --------- ./self_checkpoint > self.image & sleep 3; mv /tmp/cr-self.out /tmp/cr-self.out.orig; cp /tmp/cr-self.out.orig /tmp/cr-self.out; sed -i 's/count/xxxxx/g' /tmp/cr-self.out; ./self_restart < self.image --------- self_checkpoint.sh --------- self_restart.c creates self.image when counter i got 2, it sleeps one second each time loop starts So if I run the instructions in usage.txt by scripts, sleeping 3 seconds right after starting the self_checkpoint may not enough sometime. >> I can not reproduce yet, but at least cgroup freezer option won't affect like I mentioned. >> Sorry that it might confuse you. >> I still can not restart of external checkpoint. >> I'll try to v20 next time. > > If it doesn't work, can you please describe again the exact order of > commands that you use and the reported error(s) ? > I'll let you know in any cases. Thank you very much for the advice regards, > Oren. > >>> Maybe get the cr_tests (either from Oren's tree or from >>> git clone git://git.sr71.net/~hallyn/cr_tests.git), cd cr_test, >>> make, cd simple, run ./ckpt and send us the contents of >>> /tmp/log, dmesg, and ckptinfo -ve /tmp/out ? >> I think it runs OK, but send it in case. >> /tmp/log was empty by the way. >> thanks >>>> Thank you again for the help! >>>> I'm feeling better to use the latest .. >>> -serge _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers