Re: [PATCH 0/1] shiftfs: uid/gid shifting filesystem

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Mon, 06 Jun 2016 15:02:30 -0700

On Sun, 2016-06-05 at 22:11 +0100, Djalal Harouni wrote:
> On Wed, Jun 01, 2016 at 12:41:00PM -0400, James Bottomley wrote:
> > On Wed, 2016-06-01 at 18:21 +0200, Michał Zegan wrote:
> > > As I sent a reply in a ... wrong way, I do it again. my question
> > > was:
> > > Why isn't it done at the vfs layer when you mount the fs in
> > > different
> > > userns, instead of using a separate filesystem for it?
> > 
> > Well, that is what this patch does:
> > 
> > http://thread.gmane.org/gmane.linux.kernel/2214882
> > 
> > However, the reason it doesn't work for me is that I want to be
> > able to
> > unpack the image into a subdirectory (so I'm not dedicating a whole
> > filesystem for this).  This is primarily for a docker hack IBM is
> > working on to allow each container instance to use a separate
> > uid/gid
> > range, so I need something that behaves much more like a bind
> > mount.
> I thought that you were using a loop device ?

No, for Architectural emulation containers, I use file roots, so
they're subdirectories of my home directory.  The interesting issues
Serge discovered are on ext4, which I needed a loop device to reproduce
(my home directory is xfs) if that's where the confusion arises?

Thinking about containers in general, a significant amount use bind
mounted file roots because that's a nice use case that hypervisors
can't match without clusterable filesystems.  However, I do know some
containers that are block image based, so whatever solution is chosen
has to support both.

>  that's precisely one of the main case that's solved with that 
> solution... mount the portable fs image into a loop device, set the 
> shift which will be only active into that subdirectory...
> 
> 
> > >  I believe it could be useful to be able to mount all filesystems 
> > > in userns with autoshifted uids, although I do not know security
> > > implications for that usage.
> > 
> > As long as you don't need to subdivide the volume, it works nicely.
> >  However, from a security point of view, that entire volume is now
> > effectively freely writeable by anyone who can set up a userns.  If 
> > you follow the shiftfs route, you can break off writeable
> > subdirectories for each namespace shift, but they can't cross over 
> > into writing subdirectories that belong to other user namespaces 
> > (assuming the uids are fully segregated).
> 
> As said in the other email, I'm not really sure about the use case at
> all... but I give you this quick test with:
> https://gist.githubusercontent.com/tixxdz/6b84c2c3bd6cb987c82255602ec
> 70f23/raw/97c9ab76878f9d7415583c00b22ca0e4a948847b/userns_test.c
> 
> $ mkdir shifted-fedora-tree && sudo mount -t shiftfs 
> -ouidmap=0:1000000:65536,gidmap=0:1000000:65536 ~/fedora-tree/
> shifted-fedora-tree

This is basically what I do for my container roots.  However, after
that I tend to set them up with scripts.  I've attached my latest build
-container script at the bottom.  As you can see from my script, all my
build containers are in /home/jejb/containers.

> [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "0 1000000 1"
> /bin/bash
> uid=0(root) gid=65534(nfsnobody) groups=65534(nfsnobody)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [root@fedora-kvm bin]# cat /proc/self/uid_map 
>          0    1000000          1
> [root@fedora-kvm bin]# echo "$(id -u)_not_a_sandboxed_app" >> shifted
> -fedora-tree/etc/fedora-release
> [root@fedora-kvm bin]# exit
> exit
> [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "48 1000000 1"
> /bin/bash
> [apache@fedora-kvm bin]$ id
> uid=48(apache) gid=65534(nfsnobody) groups=65534(nfsnobody)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [apache@fedora-kvm bin]$ echo "$(id -u)_not_a_sandboxed_app" >>
> shifted-fedora-tree/etc/fedora-release
> [apache@fedora-kvm bin]$ exit
> exit
> [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "70 1000000 1"
> /bin/bash
> [avahi@fedora-kvm bin]$ id
> uid=70(avahi) gid=65534(nfsnobody) groups=65534(nfsnobody)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [avahi@fedora-kvm bin]$ echo "$(id -u)_not_a_sandboxed_app" >>
> shifted-fedora-tree/etc/fedora-release
> [avahi@fedora-kvm bin]$ exit
> exit
> [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "1000 1000000 1"
> /bin/bash
> [tixxdz@fedora-kvm bin]$ id
> uid=1000(tixxdz) gid=65534(nfsnobody) groups=65534(nfsnobody)
> context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> [tixxdz@fedora-kvm bin]$ echo "$(id -u)_not_a_sandboxed_app" >>
> shifted-fedora-tree/etc/fedora-release
> [tixxdz@fedora-kvm bin]$ exit
> exit
> [tixxdz@fedora-kvm bin]$ cat ~/fedora-tree/etc/fedora-release 
> Fedora release 23 (Twenty Three)
> 0_not_a_sandboxed_app
> 48_not_a_sandboxed_app
> 70_not_a_sandboxed_app
> 1000_not_a_sandboxed_app

It's good to know, but most of the shiftfs bugs are in the vfs, so you
can actually test for them without having to enter a user namespace at
all becuse the uid/gid shifting occurs independently.

James
Attachment:
build-container

Description: application/shellscript