On Sun, 2016-06-05 at 22:11 +0100, Djalal Harouni wrote: > On Wed, Jun 01, 2016 at 12:41:00PM -0400, James Bottomley wrote: > > On Wed, 2016-06-01 at 18:21 +0200, Michał Zegan wrote: > > > As I sent a reply in a ... wrong way, I do it again. my question > > > was: > > > Why isn't it done at the vfs layer when you mount the fs in > > > different > > > userns, instead of using a separate filesystem for it? > > > > Well, that is what this patch does: > > > > http://thread.gmane.org/gmane.linux.kernel/2214882 > > > > However, the reason it doesn't work for me is that I want to be > > able to > > unpack the image into a subdirectory (so I'm not dedicating a whole > > filesystem for this). This is primarily for a docker hack IBM is > > working on to allow each container instance to use a separate > > uid/gid > > range, so I need something that behaves much more like a bind > > mount. > I thought that you were using a loop device ? No, for Architectural emulation containers, I use file roots, so they're subdirectories of my home directory. The interesting issues Serge discovered are on ext4, which I needed a loop device to reproduce (my home directory is xfs) if that's where the confusion arises? Thinking about containers in general, a significant amount use bind mounted file roots because that's a nice use case that hypervisors can't match without clusterable filesystems. However, I do know some containers that are block image based, so whatever solution is chosen has to support both. > that's precisely one of the main case that's solved with that > solution... mount the portable fs image into a loop device, set the > shift which will be only active into that subdirectory... > > > > > I believe it could be useful to be able to mount all filesystems > > > in userns with autoshifted uids, although I do not know security > > > implications for that usage. > > > > As long as you don't need to subdivide the volume, it works nicely. > > However, from a security point of view, that entire volume is now > > effectively freely writeable by anyone who can set up a userns. If > > you follow the shiftfs route, you can break off writeable > > subdirectories for each namespace shift, but they can't cross over > > into writing subdirectories that belong to other user namespaces > > (assuming the uids are fully segregated). > > As said in the other email, I'm not really sure about the use case at > all... but I give you this quick test with: > https://gist.githubusercontent.com/tixxdz/6b84c2c3bd6cb987c82255602ec > 70f23/raw/97c9ab76878f9d7415583c00b22ca0e4a948847b/userns_test.c > > $ mkdir shifted-fedora-tree && sudo mount -t shiftfs > -ouidmap=0:1000000:65536,gidmap=0:1000000:65536 ~/fedora-tree/ > shifted-fedora-tree This is basically what I do for my container roots. However, after that I tend to set them up with scripts. I've attached my latest build -container script at the bottom. As you can see from my script, all my build containers are in /home/jejb/containers. > [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "0 1000000 1" > /bin/bash > uid=0(root) gid=65534(nfsnobody) groups=65534(nfsnobody) > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 > [root@fedora-kvm bin]# cat /proc/self/uid_map > 0 1000000 1 > [root@fedora-kvm bin]# echo "$(id -u)_not_a_sandboxed_app" >> shifted > -fedora-tree/etc/fedora-release > [root@fedora-kvm bin]# exit > exit > [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "48 1000000 1" > /bin/bash > [apache@fedora-kvm bin]$ id > uid=48(apache) gid=65534(nfsnobody) groups=65534(nfsnobody) > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 > [apache@fedora-kvm bin]$ echo "$(id -u)_not_a_sandboxed_app" >> > shifted-fedora-tree/etc/fedora-release > [apache@fedora-kvm bin]$ exit > exit > [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "70 1000000 1" > /bin/bash > [avahi@fedora-kvm bin]$ id > uid=70(avahi) gid=65534(nfsnobody) groups=65534(nfsnobody) > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 > [avahi@fedora-kvm bin]$ echo "$(id -u)_not_a_sandboxed_app" >> > shifted-fedora-tree/etc/fedora-release > [avahi@fedora-kvm bin]$ exit > exit > [tixxdz@fedora-kvm bin]$ sudo ./userns-test -m -U -M "1000 1000000 1" > /bin/bash > [tixxdz@fedora-kvm bin]$ id > uid=1000(tixxdz) gid=65534(nfsnobody) groups=65534(nfsnobody) > context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 > [tixxdz@fedora-kvm bin]$ echo "$(id -u)_not_a_sandboxed_app" >> > shifted-fedora-tree/etc/fedora-release > [tixxdz@fedora-kvm bin]$ exit > exit > [tixxdz@fedora-kvm bin]$ cat ~/fedora-tree/etc/fedora-release > Fedora release 23 (Twenty Three) > 0_not_a_sandboxed_app > 48_not_a_sandboxed_app > 70_not_a_sandboxed_app > 1000_not_a_sandboxed_app It's good to know, but most of the shiftfs bugs are in the vfs, so you can actually test for them without having to enter a user namespace at all becuse the uid/gid shifting occurs independently. James
Attachment:
build-container
Description: application/shellscript