Le 16/12/2019 à 10:46, Christian Brauner a écrit : > On Mon, Dec 16, 2019 at 10:12:19AM +0100, Laurent Vivier wrote: >> v8: s/file->f_path.dentry/file_dentry(file)/ >> >> v7: Use the new mount API >> >> Replace >> >> static struct dentry *bm_mount(struct file_system_type *fs_type, >> int flags, const char *dev_name, void *data) >> { >> struct user_namespace *ns = current_user_ns(); >> >> return mount_ns(fs_type, flags, data, ns, ns, >> bm_fill_super); >> } >> >> by >> >> static void bm_free(struct fs_context *fc) >> { >> if (fc->s_fs_info) >> put_user_ns(fc->s_fs_info); >> } >> >> static int bm_get_tree(struct fs_context *fc) >> { >> return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns)); >> } >> >> static const struct fs_context_operations bm_context_ops = { >> .free = bm_free, >> .get_tree = bm_get_tree, >> }; >> >> static int bm_init_fs_context(struct fs_context *fc) >> { >> fc->ops = &bm_context_ops; >> return 0; >> } >> >> v6: Return &init_binfmt_ns instead of NULL in binfmt_ns() >> This should never happen, but to stay safe return a >> value we can use. >> change subject from "RFC" to "PATCH" >> >> v5: Use READ_ONCE()/WRITE_ONCE() >> move mount pointer struct init to bm_fill_super() and add smp_wmb() >> remove useless NULL value init >> add WARN_ON_ONCE() >> >> v4: first user namespace is initialized with &init_binfmt_ns, >> all new user namespaces are initialized with a NULL and use >> the one of the first parent that is not NULL. The pointer >> is initialized to a valid value the first time the binfmt_misc >> fs is mounted in the current user namespace. >> This allows to not change the way it was working before: >> new ns inherits values from its parent, and if parent value is modified >> (or parent creates its own binfmt entry by mounting the fs) child >> inherits it (unless it has itself mounted the fs). >> >> v3: create a structure to store binfmt_misc data, >> add a pointer to this structure in the user_namespace structure, >> in init_user_ns structure this pointer points to an init_binfmt_ns >> structure. And all new user namespaces point to this init structure. >> A new binfmt namespace structure is allocated if the binfmt_misc >> filesystem is mounted in a user namespace that is not the initial >> one but its binfmt namespace pointer points to the initial one. >> add override_creds()/revert_creds() around open_exec() in >> bm_register_write() >> >> v2: no new namespace, binfmt_misc data are now part of >> the mount namespace >> I put this in mount namespace instead of user namespace >> because the mount namespace is already needed and >> I don't want to force to have the user namespace for that. >> As this is a filesystem, it seems logic to have it here. >> >> This allows to define a new interpreter for each new container. >> >> But the main goal is to be able to chroot to a directory >> using a binfmt_misc interpreter without being root. >> >> I have a modified version of unshare at: >> >> https://github.com/vivier/util-linux.git branch unshare-chroot >> >> with some new options to unshare binfmt_misc namespace and to chroot >> to a directory. >> >> If you have a directory /chroot/powerpc/jessie containing debian for powerpc >> binaries and a qemu-ppc interpreter, you can do for instance: >> >> $ uname -a >> Linux fedora28-wor-2 4.19.0-rc5+ #18 SMP Mon Oct 1 00:32:34 CEST 2018 x86_64 x86_64 x86_64 GNU/Linux >> $ ./unshare --map-root-user --fork --pid \ >> --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/qemu-ppc:OC" \ >> --root=/chroot/powerpc/jessie /bin/bash -l >> # uname -a >> Linux fedora28-wor-2 4.19.0-rc5+ #18 SMP Mon Oct 1 00:32:34 CEST 2018 ppc GNU/Linux >> # id >> uid=0(root) gid=0(root) groups=0(root),65534(nogroup) >> # ls -l >> total 5940 >> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:58 bin >> drwxr-xr-x. 2 nobody nogroup 4096 Jun 17 20:26 boot >> drwxr-xr-x. 4 nobody nogroup 4096 Aug 12 00:08 dev >> drwxr-xr-x. 42 nobody nogroup 4096 Sep 28 07:25 etc >> drwxr-xr-x. 3 nobody nogroup 4096 Sep 28 07:25 home >> drwxr-xr-x. 9 nobody nogroup 4096 Aug 12 00:58 lib >> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 media >> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 mnt >> drwxr-xr-x. 3 nobody nogroup 4096 Aug 12 13:09 opt >> dr-xr-xr-x. 143 nobody nogroup 0 Sep 30 23:02 proc >> -rwxr-xr-x. 1 nobody nogroup 6009712 Sep 28 07:22 qemu-ppc >> drwx------. 3 nobody nogroup 4096 Aug 12 12:54 root >> drwxr-xr-x. 3 nobody nogroup 4096 Aug 12 00:08 run >> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:58 sbin >> drwxr-xr-x. 2 nobody nogroup 4096 Aug 12 00:08 srv >> drwxr-xr-x. 2 nobody nogroup 4096 Apr 6 2015 sys >> drwxrwxrwt. 2 nobody nogroup 4096 Sep 28 10:31 tmp >> drwxr-xr-x. 10 nobody nogroup 4096 Aug 12 00:08 usr >> drwxr-xr-x. 11 nobody nogroup 4096 Aug 12 00:08 var >> >> If you want to use the qemu binary provided by your distro, you can use >> >> --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/bin/qemu-ppc-static:OCF" >> >> With the 'F' flag, qemu-ppc-static will be then loaded from the main root >> filesystem before switching to the chroot. >> >> Another example is to use the 'P' flag in one chroot and not in another one (useful in a test >> environment to test different configurations of the same interpreter): >> >> ./unshare --fork --pid --mount-proc --map-root-user --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff://usr/bin/qemu-ppc-noargv0:OCF" --root=/chroot/powerpc/jessie /bin/bash -l >> root@localhost:/# sh -c 'echo $0' >> /bin/sh >> >> ./unshare --fork --pid --mount-proc --map-root-user --load-interp ":qemu-ppc:M::\x7fELF\x01\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x14:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff://usr/bin/qemu-ppc-argv0:OCFP" --root=/chroot/powerpc/jessie /bin/bash -l >> root@localhost:/# sh -c 'echo $0' >> sh > > Hey Laurent, > > We have quite some time before the v5.6 merge window opens. So I would > really like for this new feature to come with proper testing! Are there some already existing tests for binfmt_misc or namespace I can update to test the new feature? Thanks, Laurent