On 3/30/2021 10:01 AM, Mickaël Salaün wrote: > Hi, > > Is there new comments on this patch? Could we move forward? I don't see that new comments are necessary when I don't see that you've provided compelling counters to some of the old ones. It's possible to use minimal privilege with CAP_SYS_CHROOT. It looks like namespaces provide alternatives for all your use cases. The constraints required to make this work are quite limiting. Where is the real value add? > > Regards, > Mickaël > > > On 16/03/2021 21:36, Mickaël Salaün wrote: >> From: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx> >> >> Being able to easily change root directories enables to ease some >> development workflow and can be used as a tool to strengthen >> unprivileged security sandboxes. chroot(2) is not an access-control >> mechanism per se, but it can be used to limit the absolute view of the >> filesystem, and then limit ways to access data and kernel interfaces >> (e.g. /proc, /sys, /dev, etc.). >> >> Users may not wish to expose namespace complexity to potentially >> malicious processes, or limit their use because of limited resources. >> The chroot feature is much more simple (and limited) than the mount >> namespace, but can still be useful. As for containers, users of >> chroot(2) should take care of file descriptors or data accessible by >> other means (e.g. current working directory, leaked FDs, passed FDs, >> devices, mount points, etc.). There is a lot of literature that discuss >> the limitations of chroot, and users of this feature should be aware of >> the multiple ways to bypass it. Using chroot(2) for security purposes >> can make sense if it is combined with other features (e.g. dedicated >> user, seccomp, LSM access-controls, etc.). >> >> One could argue that chroot(2) is useless without a properly populated >> root hierarchy (i.e. without /dev and /proc). However, there are >> multiple use cases that don't require the chrooting process to create >> file hierarchies with special files nor mount points, e.g.: >> * A process sandboxing itself, once all its libraries are loaded, may >> not need files other than regular files, or even no file at all. >> * Some pre-populated root hierarchies could be used to chroot into, >> provided for instance by development environments or tailored >> distributions. >> * Processes executed in a chroot may not require access to these special >> files (e.g. with minimal runtimes, or by emulating some special files >> with a LD_PRELOADed library or seccomp). >> >> Allowing a task to change its own root directory is not a threat to the >> system if we can prevent confused deputy attacks, which could be >> performed through execution of SUID-like binaries. This can be >> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with >> prctl(2). To only affect this task, its filesystem information must not >> be shared with other tasks, which can be achieved by not passing >> CLONE_FS to clone(2). A similar no_new_privs check is already used by >> seccomp to avoid the same kind of security issues. Furthermore, because >> of its security use and to avoid giving a new way for attackers to get >> out of a chroot (e.g. using /proc/<pid>/root, or chroot/chdir), an >> unprivileged chroot is only allowed if the calling process is not >> already chrooted. This limitation is the same as for creating user >> namespaces. >> >> This change may not impact systems relying on other permission models >> than POSIX capabilities (e.g. Tomoyo). Being able to use chroot(2) on >> such systems may require to update their security policies. >> >> Only the chroot system call is relaxed with this no_new_privs check; the >> init_chroot() helper doesn't require such change. >> >> Allowing unprivileged users to use chroot(2) is one of the initial >> objectives of no_new_privs: >> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html >> This patch is a follow-up of a previous one sent by Andy Lutomirski: >> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@xxxxxxxxxxxxxx/ >> >> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> >> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx> >> Cc: Christoph Hellwig <hch@xxxxxx> >> Cc: David Howells <dhowells@xxxxxxxxxx> >> Cc: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx> >> Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> >> Cc: James Morris <jmorris@xxxxxxxxx> >> Cc: Jann Horn <jannh@xxxxxxxxxx> >> Cc: John Johansen <john.johansen@xxxxxxxxxxxxx> >> Cc: Kentaro Takeda <takedakn@xxxxxxxxxxxxx> >> Cc: Serge Hallyn <serge@xxxxxxxxxx> >> Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> >> Signed-off-by: Mickaël Salaün <mic@xxxxxxxxxxxxxxxxxxx> >> Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> >> Link: https://lore.kernel.org/r/20210316203633.424794-2-mic@xxxxxxxxxxx >> --- >> >> Changes since v4: >> * Use READ_ONCE(current->fs->users) (found by Jann Horn). >> * Remove ambiguous example in commit description. >> * Add Reviewed-by Kees Cook. >> >> Changes since v3: >> * Move the new permission checks to a dedicated helper >> current_chroot_allowed() to make the code easier to read and align >> with user_path_at(), path_permission() and security_path_chroot() >> calls (suggested by Kees Cook). >> * Remove now useless included file. >> * Extend commit description. >> * Rebase on v5.12-rc3 . >> >> Changes since v2: >> * Replace path_is_under() check with current_chrooted() to gain the same >> protection as create_user_ns() (suggested by Jann Horn). See commit >> 3151527ee007 ("userns: Don't allow creation if the user is chrooted") >> >> Changes since v1: >> * Replace custom is_path_beneath() with existing path_is_under(). >> --- >> fs/open.c | 23 +++++++++++++++++++++-- >> 1 file changed, 21 insertions(+), 2 deletions(-) >> >> diff --git a/fs/open.c b/fs/open.c >> index e53af13b5835..480010a551b2 100644 >> --- a/fs/open.c >> +++ b/fs/open.c >> @@ -532,6 +532,24 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd) >> return error; >> } >> >> +static inline int current_chroot_allowed(void) >> +{ >> + /* >> + * Changing the root directory for the calling task (and its future >> + * children) requires that this task has CAP_SYS_CHROOT in its >> + * namespace, or be running with no_new_privs and not sharing its >> + * fs_struct and not escaping its current root (cf. create_user_ns()). >> + * As for seccomp, checking no_new_privs avoids scenarios where >> + * unprivileged tasks can affect the behavior of privileged children. >> + */ >> + if (task_no_new_privs(current) && READ_ONCE(current->fs->users) == 1 && >> + !current_chrooted()) >> + return 0; >> + if (ns_capable(current_user_ns(), CAP_SYS_CHROOT)) >> + return 0; >> + return -EPERM; >> +} >> + >> SYSCALL_DEFINE1(chroot, const char __user *, filename) >> { >> struct path path; >> @@ -546,9 +564,10 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename) >> if (error) >> goto dput_and_out; >> >> - error = -EPERM; >> - if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT)) >> + error = current_chroot_allowed(); >> + if (error) >> goto dput_and_out; >> + >> error = security_path_chroot(&path); >> if (error) >> goto dput_and_out; >>