Hello all, I found an incorrect behavior of the propagate_mnt function on the kernel beyond 3.13.x. It seems to be included to the below patch called "Smarter propagate_mnt" http://patchwork.ozlabs.org/patch/345152/ Let's assume the following case. Process A has one mount tree for the root file system and it forks 3 child processes (B to D). after that, each child processes unshares their mount namespace and makes its root tree to slave node of the master 1 process A : / (shared 1) process B : / (master 1) process C : / (master 1) process D : / (master 1) if the process A attaches a new tmpfs mount to its path, /tmp, then the event will be propagated to all its slaves. the final state will be like below. process A : / (shared 1), /tmp (shared 2) process B : / (master 1, slave node), /tmp (master 2) <-- slave state (ok) process C : / (master 1, slave node), /tmp (master 2) <-- slave state (ok) process D : / (master 1, slave node), /tmp (master 2) <-- slave state (ok) However, on the propagate_mnt function of the kernel beyond 3.13.x, the /tmp mount tree of the process C and D will be shared and slave state. process A : / (shared 1), /tmp (shared 2) process B : / (master 1, slave node), /tmp (master 2) <-- slave state (ok) process C : / (master 1, slave node), /tmp (shared 3, master 2) <-- shared and slave state (?) process D : / (master 1, slave node), /tmp (shared 3, master 2) <-- shared and slave state (?) You can simply test this with following procedures. (I tested with kernel 3.13.x, 3.16.x and 3.19.x) root@wj-VirtualBox:~# mkdir root root@wj-VirtualBox:~# mount -t tmpfs none root root@wj-VirtualBox:~# mount --make-shared root root@wj-VirtualBox:~# mkdir root/tmp root@wj-VirtualBox:~# cat /proc/self/mountinfo 33 22 0:25 / /home/wj/root rw,relatime shared:1 - tmpfs none rw Creating a child process root@wj-VirtualBox:~# unshare -m xterm & [1] 2161 Making its root tree to slave node and creating two child processes on the xterm (pid 2161) root@wj-VirtualBox:~# mount --make-slave root root@wj-VirtualBox:~# unshare -m xterm & [1] 2348 root@wj-VirtualBox:~# unshare -m xterm & [2] 2398 root@wj-VirtualBox:~# cat /proc/self/mountinfo 52 36 0:25 / /home/wj/root rw,relatime master:1 - tmpfs none rw 53 36 0:25 / /home/wj/mnt rw,relatime - tmpfs none rw Mounting a new tmpfs to root/tmp on the initial terminal (i.e, the shared root node) root@wj-VirtualBox:~# mount -t tmpfs none root/tmp/ Checking on the first child xterm (pid 2161) root@wj-VirtualBox:~# cat /proc/self/mountinfo 52 36 0:25 / /home/wj/root rw,relatime master:1 - tmpfs none rw 93 52 0:26 / /home/wj/root/tmp rw,relatime master:2 - tmpfs none rw Checking on each child xterms (2348 and 2398) root@wj-VirtualBox:~# cat /proc/self/mountinfo 90 74 0:25 / /home/wj/root rw,relatime master:1 - tmpfs none rw 94 90 0:26 / /home/wj/root/tmp rw,relatime shared:3 master:2 - tmpfs none rw I think this is caused by the condition check on the propagate_one() function. In the above case, the event propagation sequence is 'A -> B -> C -> D'. On processing the event from B to C, the variable 'last_dest' is set with the root of 'B' and the variable 'm' is that of process 'C'. Because both of them are slave type, their mnt_group_id is zero. Therefore, CL_MAKE_SHARED is used for cloning tree. static int propagate_one(struct mount *m) ... if (m->mnt_group_id == last_dest->mnt_group_id) { type = CL_MAKE_SHARED; } else { ... type = CL_SLAVE; } ... child = copy_tree(last_source, last_source->mnt->mnt_root, type); Even if the 'else' case is performed on the above code, the clone_mnt() function invoked by the copy_tree() seems to have another defect. Note that the variable 'old' is that of process 'B' and the 'mnt' is newly cloned one for process 'C' static struct mount *clone_mnt(struct mount *old, struct dentry *root, ... if ((flag & CL_SLAVE) || ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); mnt->mnt_master = old; I made a temporal patch to prevent this unexpected behavior, but I'm not sure it is safe and works correctly for all propagation cases. Please review the below patch and let me know your opinion. Thanks Woojoong diff --git a/fs/namespace.c b/fs/namespace.c index ee39eed..94f6994 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1170,7 +1170,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, list_add_tail(&mnt->mnt_instance, &sb->s_mounts); unlock_mount_hash(); - if ((flag & CL_SLAVE) || + if (((flag & CL_SLAVE) && !IS_MNT_SLAVE(old)) || ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) { list_add(&mnt->mnt_slave, &old->mnt_slave_list); mnt->mnt_master = old; diff --git a/fs/pnode.c b/fs/pnode.c index 19d994d..ad4b84e 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -238,7 +238,7 @@ static int propagate_one(struct mount *m) if (!is_subdir(mp->m_dentry, m->mnt.mnt_root)) #endif return 0; - if (m->mnt_group_id == last_dest->mnt_group_id) { + if (m->mnt_group_id && m->mnt_group_id == last_dest->mnt_group_id) { type = CL_MAKE_SHARED; } else { struct mount *n, *p; ��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥