Came across this while testing LXC. 1. Does ckpt_remount_proc() need to unshare() ? Or can we have the clone() that calls __ckpt_coordinator() clone with CLONE_NEWNS|CLONE_FS instead ? The problem with the unshare() in ckpt_remount_proc() is that it creates an extra level in cgroup hierarchy (see below) after restart. So applications expecting the cgroup hierarchy before chckpoint will be surprised. 2. When --mount-pty (or --mntns) is specified, do we need to unshare() in the parent process ? Considering only the full-container restart for now (ignore self-restart and subtree restart), can we just specify (CLONE_NEWNS|CLONE_FS) at the time of creating the first restarted process ? Here is an example (using LXC) that shows the problems I am running into Attached is a quick hack to point out the unshare() calls I am referring to. If I create a simple container with LXC $ lxc-execute --name foo --rcfile lxc-macvlan.conf -- /bin/sleep 1000 It creates the following three processes: PID PPID CMD 3350 3239 lxc-execute --name foo -- /bin/sleep 1000 3353 3350 /usr/local/libexec/lxc-init -- /bin/sleep 1000 3357 3353 /bin/sleep 1000 A new cgroup is created named 'foo' (which is basically a user-space rename of the pid of the lxc-init). This cgroup is in the root cgroup directory and has two tasks (lxc-init, sleep) $ cat /cgroup/foo/tasks 3353 3357 When I checkpoint and restart this container (using the equivalent of --pidns --pids --mount-pty options to /bin/restart). I get three processes: 3434 3375 ./lxc_restart --name bar --statefile=/root/foo.ckpt 3436 3434 /usr/local/libexec/lxc-init -- /bin/sleep 1000 3437 3436 /bin/sleep 1000 But the directory in /cgroup referring to lxc-init is 3 levels deep: ls /cgroup/3434/3436/1 cgroup.procs freezer.state notify_on_release tasks Here is the complete hierarchy created after the restart: $ ls -R /cgroup/3434 /cgroup/3434: 3436 cgroup.procs freezer.state notify_on_release tasks /cgroup/3434/3436: 1 cgroup.procs freezer.state notify_on_release tasks /cgroup/3434/3436/1: cgroup.procs freezer.state notify_on_release tasks $ cat /cgroup/3434/tasks 3434 $ cat /cgroup/3434/3436/tasks # empty $ cat /cgroup/3434/3436/1/tasks 3436 3437 I think we get the directory /cgroup/3434 due to the following unshare() /* private mounts namespace ? */ if (args->mntns && unshare(CLONE_NEWNS | CLONE_FS) < 0) { ckpt_perror("unshare"); exit(1); } And we get the "3436/1" directory due to the unshare() in ckpt_remount_proc(). Following hack seems to fix both the levels and the lxc_restart command correctly creates just the "/cgroup/3436" (which LXC renames to "/cgroup/bar" cgroup). --- From: Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx> Date: Mon, 8 Mar 2010 12:03:46 -0800 Subject: [PATCH 1/1] Minimize unshare() calls --- restart.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/restart.c b/restart.c index c82de21..6ac51e3 100644 --- a/restart.c +++ b/restart.c @@ -459,10 +459,12 @@ int app_restart(struct app_restart_args *args) exit(1); /* private mounts namespace ? */ +#if 0 if (args->mntns && unshare(CLONE_NEWNS | CLONE_FS) < 0) { ckpt_perror("unshare"); exit(1); } +#endif /* chroot ? */ if (args->root && chroot(args->root) < 0) { @@ -717,10 +719,12 @@ static int ckpt_probe_child(pid_t pid, char *str) */ static int ckpt_remount_proc(struct ckpt_ctx *ctx) { +#if 0 if (unshare(CLONE_NEWNS | CLONE_FS) < 0) { ckpt_perror("unshare"); return -1; } +#endif /* this is unlikely, but we don't want to fail */ if (umount2("/proc", MNT_DETACH) < 0) { if (ckpt_cond_fail(ctx, CKPT_COND_MNTPROC)) { @@ -778,6 +782,7 @@ static int ckpt_coordinator_pidns(struct ckpt_ctx *ctx) int copy, ret; genstack stk; void *sp; + unsigned long flags = SIGCHLD; ckpt_dbg("forking coordinator in new pidns\n"); @@ -802,7 +807,9 @@ static int ckpt_coordinator_pidns(struct ckpt_ctx *ctx) copy = ctx->args->copy_status; ctx->args->copy_status = 1; - coord_pid = clone(__ckpt_coordinator, sp, CLONE_NEWPID|SIGCHLD, ctx); + flags |= CLONE_NEWPID|CLONE_NEWNS|CLONE_FS; + + coord_pid = clone(__ckpt_coordinator, sp, flags, ctx); genstack_release(stk); if (coord_pid < 0) { ckpt_perror("clone coordinator"); -- 1.6.6.1 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers