On 11/12/21 5:13 AM, Christian Brauner wrote: > On Thu, Nov 11, 2021 at 02:11:42PM -0800, Dave Marchevsky wrote: >> Since commit 73f03c2b4b52 ("fuse: Restrict allow_other to the >> superblock's namespace or a descendant"), access to allow_other FUSE >> filesystems has been limited to users in the mounting user namespace or >> descendants. This prevents a process that is privileged in its userns - >> but not its parent namespaces - from mounting a FUSE fs w/ allow_other >> that is accessible to processes in parent namespaces. >> >> While this restriction makes sense overall it breaks a legitimate >> usecase for me. I have a tracing daemon which needs to peek into >> process' open files in order to symbolicate - similar to 'perf'. The >> daemon is a privileged process in the root userns, but is unable to peek >> into FUSE filesystems mounted with allow_other by processes in child >> namespaces. >> >> This patch adds an escape hatch to the descendant userns logic >> specifically for processes with CAP_SYS_ADMIN in the root userns. Such >> processes can already do many dangerous things regardless of namespace, >> and moreover could fork and setns into any child userns with a FUSE >> mount, so it's reasonable to allow them to interact with all allow_other >> FUSE filesystems. >> >> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> >> Cc: Miklos Szeredi <miklos@xxxxxxxxxx> >> Cc: Seth Forshee <sforshee@xxxxxxxxxxxxxxxx> >> Cc: Rik van Riel <riel@xxxxxxxxxxx> >> Cc: kernel-team@xxxxxx >> --- > > If your tracing daemon runs in init_user_ns with CAP_SYS_ADMIN why can't > it simply use a helper process/thread to > setns(userns_fd/pidfd, CLONE_NEWUSER) > to the target userns? This way we don't need to special-case > init_user_ns at all. helper process + setns could work for my usecase. But the fact that there's no way to say "I know what I am about to do is potentially stupid and dangerous, but I am root so let me do it", without spawning a helper process in this case, feels like it'll result in special-case userspace workarounds for anyone doing symbolication of backtraces. e.g. perf will have to add some logic: "did I fail to grab this exe that some process had mapped? Is it in a FUSE mounted by some descendant userns? let's fork a helper process..." Not the end of the world, but unnecessary complexity nonetheless. That being said, I agree that this patch's special-casing of init_user_ns is hacky. What do you think about a more explicit and general "let me do this stupid and dangerous thing" mechanism - perhaps a new struct fuse_conn field containing a set of exception userns', populated with ioctl or similar. > >> >> Note: I was unsure whether CAP_SYS_ADMIN or CAP_SYS_PTRACE was the best >> choice of capability here. Went with the former as it's checked >> elsewhere in fs/fuse while CAP_SYS_PTRACE isn't. >> >> fs/fuse/dir.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c >> index 0654bfedcbb0..2524eeb0f35d 100644 >> --- a/fs/fuse/dir.c >> +++ b/fs/fuse/dir.c >> @@ -1134,7 +1134,7 @@ int fuse_allow_current_process(struct fuse_conn *fc) >> const struct cred *cred; >> >> if (fc->allow_other) >> - return current_in_userns(fc->user_ns); >> + return current_in_userns(fc->user_ns) || capable(CAP_SYS_ADMIN); >> >> cred = current_cred(); >> if (uid_eq(cred->euid, fc->user_id) && >> -- >> 2.30.2 >> >>