Re: Bug: for mount namespaces inside a chroot, unshare works but nsenter doesn't

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Karel Zak <kzak@xxxxxxxxxx> writes:

> On Fri, Oct 27, 2017 at 06:07:00PM +0000, Ximin Luo wrote:
>> When unsharing persistent mount namespaces, unshare+nsenter does not seem to
>> work properly when run from inside a chroot session. However, unshare by itself
>> works.
>
> It's not related to persistent namespace, but to the way how nsenter
> uses chroot().

At a practical level it is related to persistent namespaces as this
problem will come up nowhere else.

In the non-persistent case you can do:
nsenter --mount=/proc/<pid>/ns/mnt --root=/proc/<pid>/root

Which works because the root directory is in the mount namespace.

>> As a workaround for the unshare+nsenter case, one can run `nsenter --mount=<ns>
>> chroot <real/path/to/chroot> command args`. The `--root` option to `nsenter`
>> sounds like it should work, but it does not - see below for details.
>> 
>> Is this a bug? 
>
> It seems like nsenter logic problem.
>
> The command nsenter opens root-dir and cwd file descriptors *before*
> setns() syscall, and than *after* the syscall it calls chroot(). The
> final process is in the namespace, but no in the root directory.

Which is necessary for the opening of file descriptors to have a well
defined meaning.  

>         open("/mnt/test/chroot/namespaces/mnt", O_RDONLY) = 3
>         open("/mnt/test/chroot", O_RDONLY)      = 4
>         open("/mnt/test/chroot", O_RDONLY)      = 5
>         setns(3, CLONE_NEWNS)                   = 0
>         close(3)                                = 0
>         fchdir(4)                               = 0
>         chroot(".")                             = 0
>         close(4)                                = 0
>         fchdir(5)                               = 0
>         close(5)                                = 0
>         execve("/bin/bash", ["-bash"], 0x7ffd2b5244d0 /* 31 vars */) = 0

> The patch below fixes the issue. It just moves root-dir and cwd open
> calls *after* the setns():
>
>         open("/mnt/test/chroot/namespaces/mnt", O_RDONLY) = 3
>         setns(3, CLONE_NEWNS)                   = 0
>         close(3)                                = 0
>         open("/mnt/test/chroot", O_RDONLY)      = 3
>         open("/mnt/test/chroot", O_RDONLY)      = 4
>         fchdir(4)                               = 0
>         chroot(".")                             = 0
>         close(4)                                = 0
>         fchdir(3)                               = 0
>         close(3)                                = 0
>         execve("/bin/bash", ["-bash"], 0x7fff1ff8eb60 /* 31 vars */) = 0
>
> Unfortunately, I'm not sure if this is the right way in all cases.

I believe this will break all except the case mentioned.

My personal recommendation is not to use chroot with persistent mount
namespaces.  That just seems to keep unnecessary mounts around.  Those
extra mounts will almost certainly be a problem later when you discover
you want to unmount one of those mounted filesystems you don't care
about but are chrooting over.

I think it would be quite reasonable to have an additional option to
open things in the new mount namespace, just before exec.  I just don't
see how useful it would be.

A second possibility is to issue a warning if root and is not a member
of the target mount namespace.  That might even allow doing the right
thing automatically.  It looks like the mnt_id is available from
/proc/<pid>/fdinfo/<fd#>.  So it looks like it is possible with the
existing kernel interfaces (at least in theory).

Ugh.  It looks like you commited your change below to sys-utils by
accident.

Eric


>
>
> Examples:
>
> *** I have simple chroot directory:
>
>         ls -la /mnt/test/chroot
>         total 20
>         drwxr-xr-x    5 root root 4096 Nov  3 13:10 .
>         drwxr-xr-x.   8 root root 4096 Nov  2 15:36 ..
>         lrwxrwxrwx    1 root root    8 Nov  2 15:40 bin -> /usr/bin
>         lrwxrwxrwx    1 root root    8 Nov  2 15:40 lib -> /usr/lib
>         lrwxrwxrwx    1 root root   10 Nov  2 15:40 lib64 -> /usr/lib64
>         drwxr-xr-x    4 root root 4096 Nov  3 13:22 namespaces
>         dr-xr-xr-x  330 root root    0 Sep 26 22:17 proc
>         lrwxrwxrwx    1 root root    9 Nov  2 15:40 sbin -> /usr/sbin
>         drwxr-xr-x.  14 root root 4096 Aug 16 10:50 usr
>
> where is bind mounted /usr and mounted /proc
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION --submounts /mnt/test/chroot
>         TARGET                  SOURCE                      FSTYPE PROPAGATION
>         /mnt/test/chroot        /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/mnt/test/chroot/usr  /dev/sda4[/usr]             ext4   shared
>         └─/mnt/test/chroot/proc proc                        proc   private
>
> let's enter the root and create persistent mount namespace within the chroot:
>
>         # chroot /mnt/test/chroot
>         # unshare --mount=namespaces/mnt
>
> our mount table:
>
>         findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
> and our mount namespace:
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 12:56 mnt -> mnt:[4026532457]
>
> our pid:
>
>         # echo $$
>         14411
>
> IMHO good idea is keep the shell alive in the chroot and use another session 
> to play with nsenter.
>
> *** nsenter examples:
>
> a) let's try it by PID, all works as expected:
>
>         # nsenter --target 14411 --mount --root --wd
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 13:02 mnt -> mnt:[4026532457]
>
>    Important note: in this case nsenter uses /proc/<target>/root for
>    chroot(), but the goal is to use persistent namespace where no <target>
>    available.
>
> b) let's try chroot() by path:
>
>         # nsenter --target 14411 --mount --root=/mnt/test/chroot --wd=/mnt/test/chroot
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>
>    failed, mount table is empty
>
> c) let's try chroot by /proc paths:
>
>         # nsenter --target 14411 --mount --root=/mnt/test/chroot/proc/14411/root --wd=/mnt/test/chroot/proc/14411/cwd
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 13:09 mnt -> mnt:[4026532457]
>
>    it works!
>
>
> Note that --target or --mount=<persistent> namespace does not change
> anything here.
>
> The nsenter with the patch:
>
>
>         # ./nsenter --mount=/mnt/test/chroot/namespaces/mnt  --root=/mnt/test/chroot --wd=/mnt/test/chroot
>
>         # findmnt -oTARGET,SOURCE,FSTYPE,PROPAGATION
>         TARGET  SOURCE                      FSTYPE PROPAGATION
>         /       /dev/sda4[/mnt/test/chroot] ext4   private
>         ├─/usr  /dev/sda4[/usr]             ext4   private
>         └─/proc proc                        proc   private
>
>         # ls -la /proc/self/ns | grep mnt
>         lrwxrwxrwx 1 0 0 0 Nov  3 13:11 mnt -> mnt:[4026532457]
>
> all works as expected. The patch is below.
>
>     Karel

>
>
> diff --git a/sys-utils/nsenter.c b/sys-utils/nsenter.c
> index 9c452c1d1..464f9f98c 100644
> --- a/sys-utils/nsenter.c
> +++ b/sys-utils/nsenter.c
> @@ -238,6 +238,7 @@ int main(int argc, char *argv[])
>  	int do_fork = -1; /* unknown yet */
>  	uid_t uid = 0;
>  	gid_t gid = 0;
> +	const char *rd_path = NULL, *wd_path = NULL;
>  #ifdef HAVE_LIBSELINUX
>  	bool selinux = 0;
>  #endif
> @@ -318,13 +319,13 @@ int main(int argc, char *argv[])
>  			break;
>  		case 'r':
>  			if (optarg)
> -				open_target_fd(&root_fd, "root", optarg);
> +				rd_path = optarg;
>  			else
>  				do_rd = true;
>  			break;
>  		case 'w':
>  			if (optarg)
> -				open_target_fd(&wd_fd, "cwd", optarg);
> +				wd_path = optarg;
>  			else
>  				do_wd = true;
>  			break;
> @@ -433,6 +434,11 @@ int main(int argc, char *argv[])
>  		}
>  	}
>  
> +	if (wd_path)
> +		open_target_fd(&wd_fd, "cwd", wd_path);
> +	if (rd_path)
> +		open_target_fd(&root_fd, "root", rd_path);
> +
>  	/* Remember the current working directory if I'm not changing it */
>  	if (root_fd >= 0 && wd_fd < 0) {
>  		wd_fd = open(".", O_RDONLY);
>
>     
>
>
>> I'm trying to write code to work regardless of whether it's run
>> inside a chroot, so it would be nice not to have to pass arguments to
>> `nsenter(1)` that are specific to chroots, like `chroot <real/path/to/chroot>`.
>> It's also a bit counterintuitive to have to re-enter the chroot again.
>> 
>> Also, these extra steps are not needed with `unshare(1)`, which works fine by
>> itself. It's solely re-entering the namespace that seems to be problematic.
>> 
>> I'm using util-linux 2.30.2-0.1 on Debian. I don't believe it's a problem
>> specific to Debian, because everything works when using `unshare(1)` by itself,
>> as stated.
>> 
>> (I haven't tried running this inside a chroot-inside-a-chroot.)
>> 
>> Details:
>> 
>> # Below is all run inside a "schroot" session, which is a Debian tool for making chroot use more convenient.
>> # I used the instructions here (https://wiki.debian.org/sbuild#Create_the_chroot) to create one.
>> 
>> ## Preparation for the tests
>> 
>> # Enter the chroot
>> $ sudo schroot -c unstable-amd64-sbuild
>> # Set up a private-bind file to hold a handle to our new namespace, as documented in the man page of unshare(1)
>> (unstable-amd64-sbuild)root@localhost:/tmp# touch ns-mnt; mount --bind --make-private ns-mnt ns-mnt
>> # Set up our test script
>> (unstable-amd64-sbuild)root@localhost:/tmp# script='mount; ls /; ls -l /proc/$$/ns/mnt; mount -B /dev/null /etc/hosts; echo hosts:; cat /etc/hosts'
>> 
>> ## Case 1: unshare(1) with no special options or commands, everything works as expected
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# unshare --mount=ns-mnt sh -ec "$script"
>> unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> proc on /proc type proc (rw,relatime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>> [.. etc. other mappings in my chroot ..]
>> unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
>> lrwxrwxrwx 1 root root 0 Oct 27 17:35 /proc/31691/ns/mnt -> 'mnt:[4026532398]'
>> hosts:
>> [.. empty hosts (inside the namespace) ..]
>> # we are now back outside the namespace
>> # if we cat /etc/hosts (both inside and outside the chroot), we see the original
>> 
>> ## now we try to re-enter the namespace.
>> 
>> ## Case 2: nsenter(1) with no extra options or commands, doesn't work:
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt sh -ec "$script"
>> [.. mappings for my host system, outside the chroot ..]
>> bin  boot  dev	etc  home  initrd.img  initrd.img.old  lib  lib32  lib64  libx32  lost+found  media  mnt  opt  proc  root  run	sbin  selinux  srv  sys  tmp  usr  var	vmlinuz  vmlinuz.old
>> [.. aka the / on my host filesystem outside the chroot ..]
>> lrwxrwxrwx 1 root root 0 Oct 27 19:36 /proc/32434/ns/mnt -> 'mnt:[4026532398]'
>> [.. correct namespace ..]
>> hosts:
>> [.. empty hosts (inside the namespace) ..]
>> # if we cat /etc/hosts outside the namespace, it's non-empty inside the chroot but EMPTY outside the chroot.
>> # whoops, because we ran mount -B on the original non-chrooted / filesystem. findmnt says:
>> └─/etc/hosts                          udev[/null]                        devtmpfs    rw,nosuid,relatime,size=8181852k,nr_inodes=2045463,mode=755
>> # we unmount it before proceeding
>> 
>> ## Case 3: nsenter(1) with --root, partially works but not really:
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --root=/ --mount=ns-mnt sh -ec "$script"
>> [.. i.e. mount(1) gives empty output ..]
>> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
>> [.. at least the root is inside the chroot ..]
>> lrwxrwxrwx 1 root root 0 Oct 27 17:37 /proc/878/ns/mnt -> 'mnt:[4026532398]'
>> [.. correct namespace ..]
>> mount: /etc/hosts: wrong fs type, bad option, bad superblock on /dev/null, missing codepage or helper program, or other error.
>> [.. mount operations fail, but the namespace is correct ..]
>> [.. if you analyse this case a bit more, you find that /proc/$$/{mounts,mountinfo,mountstats} are all empty ..]
>> # exit code 32
>> # outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot
>> 
>> ## Case 4: nsenter(1) with explicit chroot(1) call, everything works as expected, again:
>> 
>> (unstable-amd64-sbuild)root@localhost:/tmp# nsenter --mount=ns-mnt chroot /run/schroot/mount/<<SESSIONID>> sh -ec 'mount && ls /'
>> unstable-amd64-sbuild on / type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> proc on /proc type proc (rw,relatime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
>> [.. etc. other mappings in my chroot ..]
>> unstable-amd64-sbuild on /tmp/ns-mnt type overlay (rw,relatime,lowerdir=/var/lib/schroot/union/underlay/<<SESSIONID>>,...)
>> [.. great, we got our mounts back! ..]
>> bin  boot  build  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run	sbin  srv  sys	tmp  usr  var
>> lrwxrwxrwx 1 root root 0 Oct 27 17:39 /proc/2025/ns/mnt -> 'mnt:[4026532398]'
>> [.. correct namespace ..]
>> hosts:
>> [.. empty hosts, as desired ..]
>> # outside the namespace, /etc/hosts is still non-empty, both inside and outside the chroot
>> 
>> -- 
>> GPG: ed25519/56034877E1F87C35
>> GPG: rsa4096/1318EFAC5FBBDBCE
>> https://github.com/infinity0/pubkeys.git
>> --
>> To unsubscribe from this list: send the line "unsubscribe util-linux" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux