On Wed, Mar 10, 2021 at 8:23 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > Mickaël Salaün <mic@xxxxxxxxxxx> writes: > > > From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx> > > > > Being able to easily change root directories enable to ease some > > development workflow and can be used as a tool to strengthen > > unprivileged security sandboxes. chroot(2) is not an access-control > > mechanism per se, but it can be used to limit the absolute view of the > > filesystem, and then limit ways to access data and kernel interfaces > > (e.g. /proc, /sys, /dev, etc.). > > > > Users may not wish to expose namespace complexity to potentially > > malicious processes, or limit their use because of limited resources. > > The chroot feature is much more simple (and limited) than the mount > > namespace, but can still be useful. As for containers, users of > > chroot(2) should take care of file descriptors or data accessible by > > other means (e.g. current working directory, leaked FDs, passed FDs, > > devices, mount points, etc.). There is a lot of literature that discuss > > the limitations of chroot, and users of this feature should be aware of > > the multiple ways to bypass it. Using chroot(2) for security purposes > > can make sense if it is combined with other features (e.g. dedicated > > user, seccomp, LSM access-controls, etc.). > > > > One could argue that chroot(2) is useless without a properly populated > > root hierarchy (i.e. without /dev and /proc). However, there are > > multiple use cases that don't require the chrooting process to create > > file hierarchies with special files nor mount points, e.g.: > > * A process sandboxing itself, once all its libraries are loaded, may > > not need files other than regular files, or even no file at all. > > * Some pre-populated root hierarchies could be used to chroot into, > > provided for instance by development environments or tailored > > distributions. > > * Processes executed in a chroot may not require access to these special > > files (e.g. with minimal runtimes, or by emulating some special files > > with a LD_PRELOADed library or seccomp). > > > > Allowing a task to change its own root directory is not a threat to the > > system if we can prevent confused deputy attacks, which could be > > performed through execution of SUID-like binaries. This can be > > prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with > > prctl(2). To only affect this task, its filesystem information must not > > be shared with other tasks, which can be achieved by not passing > > CLONE_FS to clone(2). A similar no_new_privs check is already used by > > seccomp to avoid the same kind of security issues. Furthermore, because > > of its security use and to avoid giving a new way for attackers to get > > out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is > > only allowed if the new root directory is the same or beneath the > > current one. This still allows a process to use a subset of its > > legitimate filesystem to chroot into and then further reduce its view of > > the filesystem. > > > > This change may not impact systems relying on other permission models > > than POSIX capabilities (e.g. Tomoyo). Being able to use chroot(2) on > > such systems may require to update their security policies. > > > > Only the chroot system call is relaxed with this no_new_privs check; the > > init_chroot() helper doesn't require such change. > > > > Allowing unprivileged users to use chroot(2) is one of the initial > > objectives of no_new_privs: > > https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html > > This patch is a follow-up of a previous one sent by Andy Lutomirski, but > > with less limitations: > > https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@xxxxxxxxxxxxxx/ [...] > Neither is_path_beneath nor path_is_under really help prevent escapes, > as except for open files and files accessible from proc chroot already > disallows going up. The reason is the path is resolved with the current > root before switching to it. Yeah, this probably should use the same check as the CLONE_NEWUSER logic, current_chrooted() from CLONE_NEWUSER; that check is already used for guarding against the following syscall sequence, which has similar security properties: unshare(CLONE_NEWUSER); // gives the current process namespaced CAP_SYS_ADMIN chroot("<...>"); // succeeds because of namespaced CAP_SYS_ADMIN The current_chrooted() check in create_user_ns() is for the same purpose as the check you're introducing here, so they should use the same logic.