On Sat, Oct 6, 2018 at 9:36 PM Laurent Vivier <laurent@xxxxxxxxx> wrote: > This patch allows to have a different binfmt_misc configuration > for each new user namespace. By default, the binfmt_misc configuration > is the one of the previous level, but if the binfmt_misc filesystem is > mounted in the new namespace a new empty binfmt instance is created and > used in this namespace. > > For instance, using "unshare" we can start a chroot of an another > architecture and configure the binfmt_misc interpreter without being root > to run the binaries in this chroot. > > Signed-off-by: Laurent Vivier <laurent@xxxxxxxxx> > --- [...] > +static struct binfmt_namespace *binfmt_ns(struct user_namespace *ns) > +{ > + while (ns) { > + if (ns->binfmt_ns) > + return ns->binfmt_ns; > + ns = ns->parent; > + } > + return NULL; > +} If the value being read can change under you, please use READ_ONCE(). Also: That "return NULL" can never happen, right? You should probably at least put a WARN(...) in there. [...] > @@ -838,7 +858,29 @@ static int bm_fill_super(struct super_block *sb, void *data, int silent) > static struct dentry *bm_mount(struct file_system_type *fs_type, > int flags, const char *dev_name, void *data) > { > - return mount_single(fs_type, flags, data, bm_fill_super); > + struct user_namespace *ns = current_user_ns(); > + > + /* create a new binfmt namespace > + * if we are not in the first user namespace > + * but the binfmt namespace is the first one > + */ > + if (ns->binfmt_ns == NULL) { > + struct binfmt_namespace *new_ns; > + > + new_ns = kmalloc(sizeof(struct binfmt_namespace), > + GFP_KERNEL); > + if (new_ns == NULL) > + return ERR_PTR(-ENOMEM); > + INIT_LIST_HEAD(&new_ns->entries); > + new_ns->enabled = 1; > + rwlock_init(&new_ns->entries_lock); > + new_ns->bm_mnt = NULL; > + new_ns->entry_count = 0; > + ns->binfmt_ns = new_ns; What happens if someone mounts two instances of the binfmt_misc filesystem at the same time? Would you end up creating two binfmt namespaces, one of which would never be freed again? > + } > + > + return mount_ns(fs_type, flags, data, ns, ns, > + bm_fill_super); > } [...] > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c > index e5222b5fb4fe..da4950282ea1 100644 > --- a/kernel/user_namespace.c > +++ b/kernel/user_namespace.c > @@ -140,6 +140,10 @@ int create_user_ns(struct cred *new) > if (!setup_userns_sysctls(ns)) > goto fail_keyring; > > +#if IS_ENABLED(CONFIG_BINFMT_MISC) > + ns->binfmt_ns = NULL; > +#endif Isn't this unnecessary? The namespace is allocated with all fields zeroed: ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);