Le 16/12/2019 à 11:06, Christian Brauner a écrit : > On Mon, Dec 16, 2019 at 10:53:28AM +0100, Laurent Vivier wrote: >> Le 16/12/2019 à 10:46, Christian Brauner a écrit : >>> On Mon, Dec 16, 2019 at 10:12:19AM +0100, Laurent Vivier wrote: >>>> v8: s/file->f_path.dentry/file_dentry(file)/ >>>> >>>> v7: Use the new mount API >>>> >>>> Replace >>>> >>>> static struct dentry *bm_mount(struct file_system_type *fs_type, >>>> int flags, const char *dev_name, void *data) >>>> { >>>> struct user_namespace *ns = current_user_ns(); >>>> >>>> return mount_ns(fs_type, flags, data, ns, ns, >>>> bm_fill_super); >>>> } >>>> >>>> by >>>> >>>> static void bm_free(struct fs_context *fc) >>>> { >>>> if (fc->s_fs_info) >>>> put_user_ns(fc->s_fs_info); >>>> } >>>> >>>> static int bm_get_tree(struct fs_context *fc) >>>> { >>>> return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns)); >>>> } >>>> >>>> static const struct fs_context_operations bm_context_ops = { >>>> .free = bm_free, >>>> .get_tree = bm_get_tree, >>>> }; >>>> >>>> static int bm_init_fs_context(struct fs_context *fc) >>>> { >>>> fc->ops = &bm_context_ops; >>>> return 0; >>>> } >>>> >>>> v6: Return &init_binfmt_ns instead of NULL in binfmt_ns() >>>> This should never happen, but to stay safe return a >>>> value we can use. >>>> change subject from "RFC" to "PATCH" >>>> >>>> v5: Use READ_ONCE()/WRITE_ONCE() >>>> move mount pointer struct init to bm_fill_super() and add smp_wmb() >>>> remove useless NULL value init >>>> add WARN_ON_ONCE() >>>> >>>> v4: first user namespace is initialized with &init_binfmt_ns, >>>> all new user namespaces are initialized with a NULL and use >>>> the one of the first parent that is not NULL. The pointer >>>> is initialized to a valid value the first time the binfmt_misc >>>> fs is mounted in the current user namespace. >>>> This allows to not change the way it was working before: >>>> new ns inherits values from its parent, and if parent value is modified >>>> (or parent creates its own binfmt entry by mounting the fs) child >>>> inherits it (unless it has itself mounted the fs). >>>> >>>> v3: create a structure to store binfmt_misc data, >>>> add a pointer to this structure in the user_namespace structure, >>>> in init_user_ns structure this pointer points to an init_binfmt_ns >>>> structure. And all new user namespaces point to this init structure. >>>> A new binfmt namespace structure is allocated if the binfmt_misc >>>> filesystem is mounted in a user namespace that is not the initial >>>> one but its binfmt namespace pointer points to the initial one. >>>> add override_creds()/revert_creds() around open_exec() in >>>> bm_register_write() >>>> >>>> v2: no new namespace, binfmt_misc data are now part of >>>> the mount namespace >>>> I put this in mount namespace instead of user namespace >>>> because the mount namespace is already needed and >>>> I don't want to force to have the user namespace for that. >>>> As this is a filesystem, it seems logic to have it here. >>>> >>>> This allows to define a new interpreter for each new container. >>>> >>>> But the main goal is to be able to chroot to a directory >>>> using a binfmt_misc interpreter without being root. >>>> >>>> I have a modified version of unshare at: >>>> >>>> https://github.com/vivier/util-linux.git branch unshare-chroot >>>> >>>> with some new options to unshare binfmt_misc namespace and to chroot >>>> to a directory. >>>> >>>> If you have a directory /chroot/powerpc/jessie containing debian for powerpc >>>> binaries and a qemu-ppc interpreter, you can do for instance: >>>> >>>> $ uname -a >>>> Linux fedora28-wor-2 4.19.0-rc5+ #18 SMP Mon Oct 1 00:32:34 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux >>>> $ ./unshare --map-root-user --fork --pid \ >>>> --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/qemu-ppc:OC" \ >>>> --root=/chroot/powerpc/jessie /bin/bash -l >>>> # uname -a >>>> Linux fedora28-wor-2 4.19.0-rc5+ #18 SMP Mon Oct 1 00:32:34 CEST 2018 ppc GNU/Linux >>>> # id >>>> uid=0(root) gid=0(root) groups=0(root),65534(nogroup) >>>> # ls -l >>>> total 5940 >>>> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:58 bin >>>> drwxr-xr-x. 2 nobody nogroup 4096 Jun 17 20:26 boot >>>> drwxr-xr-x. 4 nobody nogroup 4096 Aug 12 00:08 dev >>>> drwxr-xr-x. 42 nobody nogroup 4096 Sep 28 07:25 etc >>>> drwxr-xr-x. 3 nobody nogroup 4096 Sep 28 07:25 home >>>> drwxr-xr-x. 9 nobody nogroup 4096 Aug 12 00:58 lib >>>> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 media >>>> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 mnt >>>> drwxr-xr-x. 3 nobody nogroup 4096 Aug 12 13:09 opt >>>> dr-xr-xr-x. 143 nobody nogroup 0 Sep 30 23:02 proc >>>> -rwxr-xr-x. 1 nobody nogroup 6009712 Sep 28 07:22 qemu-ppc >>>> drwx------. 3 nobody nogroup 4096 Aug 12 12:54 root >>>> drwxr-xr-x. 3 nobody nogroup 4096 Aug 12 00:08 run >>>> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:58 sbin >>>> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 srv >>>> drwxr-xr-x. 2 nobody nogroup 4096 Apr 6 2015 sys >>>> drwxrwxrwt. 2 nobody nogroup 4096 Sep 28 10:31 tmp >>>> drwxr-xr-x. 10 nobody nogroup 4096 Aug 12 00:08 usr >>>> drwxr-xr-x. 11 nobody nogroup 4096 Aug 12 00:08 var >>>> >>>> If you want to use the qemu binary provided by your distro, you can use >>>> >>>> --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/bin/qemu-ppc-static:OCF" >>>> >>>> With the 'F' flag, qemu-ppc-static will be then loaded from the main root >>>> filesystem before switching to the chroot. >>>> >>>> Another example is to use the 'P' flag in one chroot and not in another one (useful in a test >>>> environment to test different configurations of the same interpreter): >>>> >>>> ./unshare --fork --pid --mount-proc --map-root-user --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff://usr/bin/qemu-ppc-noargv0:OCF" --root=/chroot/powerpc/jessie /bin/bash -l >>>> root@localhost:/# sh -c 'echo $0' >>>> /bin/sh >>>> >>>> ./unshare --fork --pid --mount-proc --map-root-user --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff://usr/bin/qemu-ppc-argv0:OCFP" --root=/chroot/powerpc/jessie /bin/bash -l >>>> root@localhost:/# sh -c 'echo $0' >>>> sh >>> >>> Hey Laurent, >>> >>> We have quite some time before the v5.6 merge window opens. So I would >>> really like for this new feature to come with proper testing! >> >> Are there some already existing tests for binfmt_misc or namespace I can >> update to test the new feature? > > I don't think so but there are tests for other namespace-aware > filesystem. For example, I've added basic tests for binderfs in > tools/testing/selftests/filesystems/binderfs/ and there are some devpts > tests in there (Though the devpts tests don't actually make use of the > kselftest framework so they aren't a great example. I'm not claiming > binderfs is either tbh. :)) > > You can just place the binfmt_misc tests in there. Helpers for setting > up user namespace and mappings are in there as well. I think you can > just place them in a separate file/header and include it for both > binderfs and binfmt_misc. > I'm happy to review this/answer questions. > OK, thank you, I will try to add some tests here. Thanks, Laurent