On Wed, Oct 3, 2018 at 8:07 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > Laurent Vivier <laurent@xxxxxxxxx> writes: > > This patch allows to have a different binftm_misc configuration > > in each container we mount binfmt_misc filesystem with mount namespace > > enabled. > > > > A container started without the CLONE_NEWNS will use the host binfmt_misc > > configuration, otherwise the container starts with an empty binfmt_misc > > interpreters list. > > > > For instance, using "unshare" we can start a chroot of an another > > architecture and configure the binfmt_misc interpreted without being root > > to run the binaries in this chroot. > > A couple of things. > As has already been mentioned on your previous version anything that > comes through the filesystem interface needs to lookup up the local > binfmt context not through current but through file->f_dentry->d_sb. > AKA the superblock of the mounted filesystem. Something else: bm_register_write() currently calls into open_exec(), which uses the credentials of current. That's not really allowed in this context - but so far, it's not a big deal because only init-namespace root can reach this code. Before you expose this stuff to unprivileged userspace, this needs to get fixed; perhaps by wrapping the open_exec() call in override_creds(file->f_cred) and revert_creds().