On 10/03/2021 20:33, Jann Horn wrote: > On Wed, Mar 10, 2021 at 8:23 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> >> Mickaël Salaün <mic@xxxxxxxxxxx> writes: >> >>> From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx> >>> >>> Being able to easily change root directories enable to ease some >>> development workflow and can be used as a tool to strengthen >>> unprivileged security sandboxes. chroot(2) is not an access-control >>> mechanism per se, but it can be used to limit the absolute view of the >>> filesystem, and then limit ways to access data and kernel interfaces >>> (e.g. /proc, /sys, /dev, etc.). >>> >>> Users may not wish to expose namespace complexity to potentially >>> malicious processes, or limit their use because of limited resources. >>> The chroot feature is much more simple (and limited) than the mount >>> namespace, but can still be useful. As for containers, users of >>> chroot(2) should take care of file descriptors or data accessible by >>> other means (e.g. current working directory, leaked FDs, passed FDs, >>> devices, mount points, etc.). There is a lot of literature that discuss >>> the limitations of chroot, and users of this feature should be aware of >>> the multiple ways to bypass it. Using chroot(2) for security purposes >>> can make sense if it is combined with other features (e.g. dedicated >>> user, seccomp, LSM access-controls, etc.). >>> >>> One could argue that chroot(2) is useless without a properly populated >>> root hierarchy (i.e. without /dev and /proc). However, there are >>> multiple use cases that don't require the chrooting process to create >>> file hierarchies with special files nor mount points, e.g.: >>> * A process sandboxing itself, once all its libraries are loaded, may >>> not need files other than regular files, or even no file at all. >>> * Some pre-populated root hierarchies could be used to chroot into, >>> provided for instance by development environments or tailored >>> distributions. >>> * Processes executed in a chroot may not require access to these special >>> files (e.g. with minimal runtimes, or by emulating some special files >>> with a LD_PRELOADed library or seccomp). >>> >>> Allowing a task to change its own root directory is not a threat to the >>> system if we can prevent confused deputy attacks, which could be >>> performed through execution of SUID-like binaries. This can be >>> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with >>> prctl(2). To only affect this task, its filesystem information must not >>> be shared with other tasks, which can be achieved by not passing >>> CLONE_FS to clone(2). A similar no_new_privs check is already used by >>> seccomp to avoid the same kind of security issues. Furthermore, because >>> of its security use and to avoid giving a new way for attackers to get >>> out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is >>> only allowed if the new root directory is the same or beneath the >>> current one. This still allows a process to use a subset of its >>> legitimate filesystem to chroot into and then further reduce its view of >>> the filesystem. >>> >>> This change may not impact systems relying on other permission models >>> than POSIX capabilities (e.g. Tomoyo). Being able to use chroot(2) on >>> such systems may require to update their security policies. >>> >>> Only the chroot system call is relaxed with this no_new_privs check; the >>> init_chroot() helper doesn't require such change. >>> >>> Allowing unprivileged users to use chroot(2) is one of the initial >>> objectives of no_new_privs: >>> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html >>> This patch is a follow-up of a previous one sent by Andy Lutomirski, but >>> with less limitations: >>> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@xxxxxxxxxxxxxx/ > [...] >> Neither is_path_beneath nor path_is_under really help prevent escapes, >> as except for open files and files accessible from proc chroot already >> disallows going up. The reason is the path is resolved with the current >> root before switching to it. > > Yeah, this probably should use the same check as the CLONE_NEWUSER > logic, current_chrooted() from CLONE_NEWUSER; that check is already > used for guarding against the following syscall sequence, which has > similar security properties: > unshare(CLONE_NEWUSER); // gives the current process namespaced CAP_SYS_ADMIN > chroot("<...>"); // succeeds because of namespaced CAP_SYS_ADMIN > > The current_chrooted() check in create_user_ns() is for the same > purpose as the check you're introducing here, so they should use the > same logic. > I don't know how I missed this, but current_chrooted() is definitely the right approach.