On Tue, May 11, 2021 at 02:14:01AM -0500, YiFei Zhu wrote: > On Mon, May 10, 2021 at 9:04 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Mon, May 10, 2021 at 12:22:47PM -0500, YiFei Zhu wrote: > > > > > > +BPF_CALL_3(bpf_probe_read_user_dumpable, void *, dst, u32, size, > > > + const void __user *, unsafe_ptr) > > > +{ > > > + int ret = -EPERM; > > > + > > > + if (get_dumpable(current->mm)) > > > + ret = copy_from_user_nofault(dst, unsafe_ptr, size); > > > > Could you explain a bit more how dumpable flag makes it safe for unpriv? > > The unpriv prog is attached to the children tasks only, right? > > and dumpable gets cleared if euid changes? > > This is the "reduction to ptrace". The model here is that the eBPF > seccomp filter is doing the equivalent of ptracing the user process > using the privileges of the task at the time of loading the seccomp > filter. > > ptrace access control is governed by ptrace.c:__ptrace_may_access. The > requirements are: > * always allow thread group introspection -- assume false so we are > more restrictive than ptrace. > * tracer has CAP_PTRACE in the target user namespace or tracer > r/fsu/gidid equal target resu/gid -- discuss below > * tracer has CAP_PTRACE in the target user namespace or target is > SUID_DUMP_USER (I realized I should probably change the condition to > == SUID_DUMP_USER). > * passes LSM checks (eg yama ptrace_scope) -- we expose a hook to LSM > but it's more of a "disable all advanced seccomp-eBPF features". How > would a better interface to LSM look like? > > The dumpable check handles the "target is SUID_DUMP_USER" condition, > in the circumstance that the loader does not have CAP_PTRACE in its > namespace at the time of load. Why would this imply its CAP_PTRACE > capability in target namespace? This is based on my understanding on > how capabilities and user namespaces interact: > For the sake of simplicity, let's first assume that loader is the same > task as the task that attaches the filter (via prctl or seccomp > syscall). > * Case 1: target and loader are the same user namespace. Trivial case, > the two operations are the same. > * Case 2: target is loader's parent namespace. Can't happen under > assumption. Seccomp affects itself and children only, and it is only > possible to join a descendant user ns. > * Case 3: target is loader's descendant namespace. Loader would have > full CAP_PTRACE on target. We are more restrictive than ptrace. > * Case 4: target and loader are on unrelated namespace branches. Can't > happen under assumption. Same as case 2. > > Let's break this assumption and see what happens if the loader and > attacher are in different contexts: > * Case 1: attacher is less capable (as a general term of "what it can > do") than loader then all of the above applies, since the model > concerns and checks the capabilities of the loader. > * Case 2: attacher is more capable than loader. The attacher would > need an fd to the prog to attach it: > * subcase 1: attacher inherited the fd after an exec and became more > capable. uh... why is it trusting fds from a less capable context? > * subcase 2: attacher has CAP_SYS_ADMIN and gets the fd via > BPF_PROG_GET_FD_BY_ID. uh... why is it trusting random fds and > attaching it? > * subcase 3: attacher received the fd via a domain socket from a > process which may be in a different user namespace. On my first > thought, I thought, why is it trusting random fds from a less capable > context? Except I just thought of an adversary could: > * Clone into new userns, > * Load filter in child, which has CAP_PTRACE in new userns > * Send filter to the parent which doesn't have CAP_PTRACE in its userns > * It's broken :( > We'll think more about this case. One way is to check against init > namespace, which means unpriv container runtimes won't have the > non-dumpable override. Though, it shouldn't be affecting most of the > use cases. Alternatively we can store which userns it was loaded from > and reject attaching from a different userns. Typically the verifier does all the checks at load time to avoid run-time overhead during program execution. Then at attach time we check that attach parameters provided at load time match exactly to those at attach time. ifindex, attach_btf_id, etc fall into this category. Doing something similar it should be possible to avoid doing get_dumpable() at run-time.