A question about the propagate_mnt() behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all, 

I found an incorrect behavior of the propagate_mnt function on the kernel beyond 3.13.x.
It seems to be included to the below patch called "Smarter propagate_mnt"
http://patchwork.ozlabs.org/patch/345152/ 

Let's assume the following case.
Process A has one mount tree for the root file system and it forks 3 child processes (B to D).
after that, each child processes unshares their mount namespace 
and makes its root tree to slave node of the master 1

process A  :  / (shared 1)
process B  :  / (master 1)
process C  :  / (master 1)
process D  :  / (master 1)

if the process A attaches a new tmpfs mount to its path, /tmp, 
then the event will be propagated to all its slaves. 
the final state will be like below.

process A  :  / (shared 1), /tmp (shared 2)
process B  :  / (master 1, slave node), /tmp (master 2) <-- slave state (ok)
process C  :  / (master 1, slave node), /tmp (master 2) <-- slave state (ok)
process D  :  / (master 1, slave node), /tmp (master 2) <-- slave state (ok)

However, on the propagate_mnt function of the kernel beyond 3.13.x,
the /tmp mount tree of the process C and D will be shared and slave state. 

process A  :  / (shared 1), /tmp (shared 2)
process B  :  / (master 1, slave node), /tmp (master 2) <-- slave state (ok)
process C  :  / (master 1, slave node), /tmp (shared 3, master 2) <-- shared and slave state (?)
process D  :  / (master 1, slave node), /tmp (shared 3, master 2) <-- shared and slave state (?)

You can simply test this with following procedures. (I tested with kernel 3.13.x, 3.16.x and 3.19.x)

root@wj-VirtualBox:~# mkdir root
root@wj-VirtualBox:~# mount -t tmpfs none root
root@wj-VirtualBox:~# mount --make-shared root
root@wj-VirtualBox:~# mkdir root/tmp
root@wj-VirtualBox:~# cat /proc/self/mountinfo 
33 22 0:25 / /home/wj/root rw,relatime shared:1 - tmpfs none rw

Creating a child process
root@wj-VirtualBox:~# unshare -m xterm &
[1] 2161

Making its root tree to slave node and creating two child processes on the xterm (pid 2161)
root@wj-VirtualBox:~# mount --make-slave root
root@wj-VirtualBox:~# unshare -m xterm &
[1] 2348
root@wj-VirtualBox:~# unshare -m xterm &
[2] 2398

root@wj-VirtualBox:~# cat /proc/self/mountinfo 
52 36 0:25 / /home/wj/root rw,relatime master:1 - tmpfs none rw
53 36 0:25 / /home/wj/mnt rw,relatime - tmpfs none rw

Mounting a new tmpfs to root/tmp on the initial terminal (i.e, the shared root node)
root@wj-VirtualBox:~# mount -t tmpfs none root/tmp/

Checking on the first child xterm (pid 2161)
root@wj-VirtualBox:~# cat /proc/self/mountinfo 
52 36 0:25 / /home/wj/root rw,relatime master:1 - tmpfs none rw
93 52 0:26 / /home/wj/root/tmp rw,relatime master:2 - tmpfs none rw

Checking on each child xterms (2348 and 2398)
root@wj-VirtualBox:~# cat /proc/self/mountinfo 
90 74 0:25 / /home/wj/root rw,relatime master:1 - tmpfs none rw
94 90 0:26 / /home/wj/root/tmp rw,relatime shared:3 master:2 - tmpfs none rw

I think this is caused by the condition check on the propagate_one() function.
In the above case, the event propagation sequence is 'A -> B -> C -> D'.

On processing the event from B to C, the variable 'last_dest' is set with the root of 'B'
and the variable 'm' is that of process 'C'. Because both of them are slave type, 
their mnt_group_id is zero. Therefore, CL_MAKE_SHARED is used for cloning tree.

static int propagate_one(struct mount *m)
    ...
    if (m->mnt_group_id == last_dest->mnt_group_id) {
       type = CL_MAKE_SHARED;
    } else {
    ...
       type = CL_SLAVE;
    }
    ...
    child = copy_tree(last_source, last_source->mnt->mnt_root, type);

Even if the 'else' case is performed on the above code, 
the clone_mnt() function invoked by the copy_tree() seems to have another defect. 
Note that the variable 'old' is that of process 'B' and the 'mnt' is newly cloned one for process 'C'

static struct mount *clone_mnt(struct mount *old, struct dentry *root,
   ...
   if ((flag & CL_SLAVE) ||
      ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) {
      list_add(&mnt->mnt_slave, &old->mnt_slave_list);
      mnt->mnt_master = old;

I made a temporal patch to prevent this unexpected behavior, 
but I'm not sure it is safe and works correctly for all propagation cases.

Please review the below patch and let me know your opinion.
Thanks
Woojoong 

diff --git a/fs/namespace.c b/fs/namespace.c
index ee39eed..94f6994 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1170,7 +1170,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
        list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
        unlock_mount_hash();

-       if ((flag & CL_SLAVE) ||
+       if (((flag & CL_SLAVE) && !IS_MNT_SLAVE(old)) ||
            ((flag & CL_SHARED_TO_SLAVE) && IS_MNT_SHARED(old))) {
                list_add(&mnt->mnt_slave, &old->mnt_slave_list);
                mnt->mnt_master = old;
diff --git a/fs/pnode.c b/fs/pnode.c
index 19d994d..ad4b84e 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -238,7 +238,7 @@ static int propagate_one(struct mount *m)
        if (!is_subdir(mp->m_dentry, m->mnt.mnt_root))
#endif
                return 0;
-       if (m->mnt_group_id == last_dest->mnt_group_id) {
+       if (m->mnt_group_id && m->mnt_group_id == last_dest->mnt_group_id) {
                type = CL_MAKE_SHARED;
        } else {
                struct mount *n, *p;

��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux