On Mon, May 10, 2021 at 9:04 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote: > > > > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size, > > + const void __user *, unsafe_ptr) > > +{ > > + int ret = -EPERM; > > + > > + if (get_dumpable(current->mm)) > > + ret = copy_from_user_nofault(dst, unsafe_ptr, size); > > Could you explain a bit more how dumpable flag makes it safe for unpriv? > The unpriv prog is attached to the children tasks only, right? > and dumpable gets cleared if euid changes? This is the "reduction to ptrace". The model here is that the eBPF seccomp filter is doing the equivalent of ptracing the user process using the privileges of the task at the time of loading the seccomp filter. ptrace access control is governed by ptrace.c:__ptrace_may_access. The requirements are: * always allow thread group introspection -- assume false so we are more restrictive than ptrace. * tracer has CAP_PTRACE in the target user namespace or tracer r/fsu/gidid equal target resu/gid -- discuss below * tracer has CAP_PTRACE in the target user namespace or target is SUID_DUMP_USER (I realized I should probably change the condition to == SUID_DUMP_USER). * passes LSM checks (eg yama ptrace_scope) -- we expose a hook to LSM but it's more of a "disable all advanced seccomp-eBPF features". How would a better interface to LSM look like? The dumpable check handles the "target is SUID_DUMP_USER" condition, in the circumstance that the loader does not have CAP_PTRACE in its namespace at the time of load. Why would this imply its CAP_PTRACE capability in target namespace? This is based on my understanding on how capabilities and user namespaces interact: For the sake of simplicity, let's first assume that loader is the same task as the task that attaches the filter (via prctl or seccomp syscall). * Case 1: target and loader are the same user namespace. Trivial case, the two operations are the same. * Case 2: target is loader's parent namespace. Can't happen under assumption. Seccomp affects itself and children only, and it is only possible to join a descendant user ns. * Case 3: target is loader's descendant namespace. Loader would have full CAP_PTRACE on target. We are more restrictive than ptrace. * Case 4: target and loader are on unrelated namespace branches. Can't happen under assumption. Same as case 2. Let's break this assumption and see what happens if the loader and attacher are in different contexts: * Case 1: attacher is less capable (as a general term of "what it can do") than loader then all of the above applies, since the model concerns and checks the capabilities of the loader. * Case 2: attacher is more capable than loader. The attacher would need an fd to the prog to attach it: * subcase 1: attacher inherited the fd after an exec and became more capable. uh... why is it trusting fds from a less capable context? * subcase 2: attacher has CAP_SYS_ADMIN and gets the fd via BPF_PROG_GET_FD_BY_ID. uh... why is it trusting random fds and attaching it? * subcase 3: attacher received the fd via a domain socket from a process which may be in a different user namespace. On my first thought, I thought, why is it trusting random fds from a less capable context? Except I just thought of an adversary could: * Clone into new userns, * Load filter in child, which has CAP_PTRACE in new userns * Send filter to the parent which doesn't have CAP_PTRACE in its userns * It's broken :( We'll think more about this case. One way is to check against init namespace, which means unpriv container runtimes won't have the non-dumpable override. Though, it shouldn't be affecting most of the use cases. Alternatively we can store which userns it was loaded from and reject attaching from a different userns. Regarding u/gids, for an attacher to attach a seccomp filter, whether cBPF or eBPF, if it doesn't have CAP_SYS_ADMIN in its current ns, it will have to set no_new_privs on itself before it can attach. (Unlike the previous discussion, this check is done at attach time rather than load.) With no_new_privs the target's privs is a subset of the attacher's, so the attacher should have a way to match the target's resuid, so this condition is not a concern. YiFei Zhu