On Mon, Jun 26, 2023, at 8:23 AM, Daniel Borkmann wrote: > On 6/24/23 5:28 PM, Andy Lutomirski wrote: >> On Sat, Jun 24, 2023, at 6:59 AM, Andy Lutomirski wrote: >>> On Fri, Jun 23, 2023, at 4:23 PM, Daniel Borkmann wrote: >>> >>> If this series was about passing a “may load kernel modules” token >>> around, I think it would get an extremely chilly reception, even though >>> we have module signatures. I don’t see anything about BPF that makes >>> BPF tokens more reasonable unless a real security model is developed >>> first. >> >> To be clear, I'm not saying that there should not be a mechanism to use BPF from a user namespace. I'm saying the mechanism should have explicit access control. It wouldn't need to solve all problems right away, but it should allow incrementally more features to be enabled as the access control solution gets more powerful over time. >> >> BPF, unlike kernel modules, has a verifier. While it would be a departure from current practice, permission to use BPF could come with an explicit list of allowed functions and allowed hooks. >> >> (The hooks wouldn't just be a list, presumably -- premission to install an XDP program would be scoped to networks over which one has CAP_NET_ADMIN, presumably. Other hooks would have their own scoping. Attaching to a cgroup should (and maybe already does?) require some kind of permission on the cgroup. Etc.) >> >> If new, more restrictive functions are needed, they could be added. > > Wasn't this the idea of the BPF tokens proposal, meaning you could > create them with > restricted access as you mentioned - allowing an explicit subset of > program types to > be loaded, subset of helpers/kfuncs, map types, etc.. Given you pass in > this token > context upon program load-time (resp. map creation), the verifier is > then extended > for restricted access. For example, see the > bpf_token_allow_{cmd,map_type,prog_type}() > in this series. The user namespace relation was part of the use cases, > but not strictly > part of the mechanism itself in this series. Hmm. It's very coarse grained. Also, the bpf() attach API seems to be largely (completely?) missing what I would expect to be basic access controls on the things being attached to. For example, the whole cgroup_bpf_prog_attach() path seems to be entirely missing any checks as to whether its caller has any particular permission over the cgroup in question. It doesn't even check whether the cgroup is being accessed from the current userns (i.e. whether the fd refers to a struct file with f_path.mnt belonging to the current userns). So the API in this patchset has no way to restrict permission to attach to cgroups to only apply to cgroups belonging to the container. > > With regards to the scoping, are you saying that the current design > with the bitmasks > in the token create uapi is not flexible enough? If yes, what concrete > alternative do > you propose? > >> Alternatively, people could try a limited form of BPF proxying. It wouldn't need to be a full proxy -- an outside daemon really could approve the attachment of a BPF program, and it could parse the program, examine the list of function it uses and what the proposed attachment is to, and make an educated decision. This would need some API changes (maybe), but it seems eminently doable. > > Thinking about this from an k8s environment angle, I think this > wouldn't really be > practical for various reasons.. you now need to maintain two > implementations for your > container images which ships BPF one which loads programs as today, and > another one > which talks to this proxy if available, This seems fairly trivially solvable. Agree on an API, say using UNIX sockets to /var/run/bpfd/whatever.socket. (Or maybe /var/lib? I’m not sure there’s universal agreement on where things like this to.) The exact same API works uncontained (bpfd running, probably socket-activated) from a binary in the system and as a bind-mount from outside. I don’t know k8s well at all, but it looks like hostPath can do exactly this. Off the top of my head, I don’t know whether systemd’s .socket can be configured the right way so the same configuration would work contained and uncontained. One could certainly work around *that* by having two different paths tried in succession, but that seems a bit silly. This actually seems easier than supplying bpf tokens to a container. > then you also need to > standardize and support > the various loader libraries for this, you need to deal with yet one > more component > in your cluster which could fail (compared to talking to kernel > directly), and being > dependent on new proxy functionality becomes similar as with waiting > for new kernels > to hit mainstream, it could potentially take a very long time until > production upgrades. > What is being proposed here in this regard is less complex given no > extra proxy is > involved. I would certainly prefer a kernel-based solution. A userspace solution makes it easy to apply some kind of flexible approval and audit policy to the BPF program. I can imagine all kinds of ways that a fleet operator might want to control what can run, and trying to stick it in the kernel seems rather complex and awkward to customize. I suppose a bpf token could be set up to call out to its creator for permission to load a program, which would involve a different set of tradeoffs.