On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün <mic@xxxxxxxxxxx> wrote: > One could argue that chroot(2) is useless without a properly populated > root hierarchy (i.e. without /dev and /proc). However, there are > multiple use cases that don't require the chrooting process to create > file hierarchies with special files nor mount points, e.g.: > * A process sandboxing itself, once all its libraries are loaded, may > not need files other than regular files, or even no file at all. > * Some pre-populated root hierarchies could be used to chroot into, > provided for instance by development environments or tailored > distributions. > * Processes executed in a chroot may not require access to these special > files (e.g. with minimal runtimes, or by emulating some special files > with a LD_PRELOADed library or seccomp). > > Unprivileged chroot is especially interesting for userspace developers > wishing to harden their applications. For instance, chroot(2) and Yama > enable to build a capability-based security (i.e. remove filesystem > ambient accesses) by calling chroot/chdir with an empty directory and > accessing data through dedicated file descriptors obtained with > openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS. I don't entirely understand. Are you writing this with the assumption that a future change will make it possible to set these RESOLVE flags process-wide, or something like that? As long as that doesn't exist, I think that to make this safe, you'd have to do something like the following - let a child process set up a new mount namespace for you, and then chroot() into that namespace's root: struct shared_data { int root_fd; }; int helper_fn(void *args) { struct shared_data *shared = args; mount("none", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, ""); mkdir("/tmp/old_root", 0700); pivot_root("/tmp", "/tmp/old_root"); umount("/tmp/old_root", ""); shared->root_fd = open("/", O_PATH); } void setup_chroot() { struct shared_data shared = {}; prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); clone(helper_fn, my_stack, CLONE_VFORK|CLONE_VM|CLONE_FILES|CLONE_NEWUSER|CLONE_NEWNS|SIGCHLD, NULL); fchdir(shared.root_fd); chroot("."); } [...] > diff --git a/fs/open.c b/fs/open.c [...] > +static inline int current_chroot_allowed(void) > +{ > + /* > + * Changing the root directory for the calling task (and its future > + * children) requires that this task has CAP_SYS_CHROOT in its > + * namespace, or be running with no_new_privs and not sharing its > + * fs_struct and not escaping its current root (cf. create_user_ns()). > + * As for seccomp, checking no_new_privs avoids scenarios where > + * unprivileged tasks can affect the behavior of privileged children. > + */ > + if (task_no_new_privs(current) && current->fs->users == 1 && this read of current->fs->users should be using READ_ONCE() > + !current_chrooted()) > + return 0; > + if (ns_capable(current_user_ns(), CAP_SYS_CHROOT)) > + return 0; > + return -EPERM; > +} [...] Overall I think this change is a good idea.