On Fri, Jun 9, 2023 at 2:21 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > > > On Fri, Jun 9, 2023 at 4:17 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >> > >> Andrii Nakryiko <andrii@xxxxxxxxxx> writes: > >> > >> > This patch set introduces new BPF object, BPF token, which allows to delegate > >> > a subset of BPF functionality from privileged system-wide daemon (e.g., > >> > systemd or any other container manager) to a *trusted* unprivileged > >> > application. Trust is the key here. This functionality is not about allowing > >> > unconditional unprivileged BPF usage. Establishing trust, though, is > >> > completely up to the discretion of respective privileged application that > >> > would create a BPF token. > >> > >> I am not convinced that this token-based approach is a good way to solve > >> this: having the delegation mechanism be one where you can basically > >> only grant a perpetual delegation with no way to retract it, no way to > >> check what exactly it's being used for, and that is transitive (can be > >> passed on to others with no restrictions) seems like a recipe for > >> disaster. I believe this was basically the point Casey was making as > >> well in response to v1. > > > > Most of this can be added, if we really need to. Ability to revoke BPF > > token is easy to implement (though of course it will apply only for > > subsequent operations). We can allocate ID for BPF token just like we > > do for BPF prog/map/link and let tools iterate and fetch information > > about it. As for controlling who's passing what and where, I don't > > think the situation is different for any other FD-based mechanism. You > > might as well create a BPF map/prog/link, pass it through SCM_RIGHTS > > or BPF FS, and that application can keep doing the same to other > > processes. > > No, but every other fd-based mechanism is limited in scope. E.g., if you > pass a map fd that's one specific map that can be passed around, with a > token it's all operations (of a specific type) which is way broader. It's not black and white. Once you have a BPF program FD, you can attach it many times, for example, and cause regressions. Sure, here we are talking about creating multiple BPF maps or loading multiple BPF programs, so it's wider in scope, but still, it's not that fundamentally different. > > > Ultimately, currently we have root permissions for applications that > > need BPF. That's already very dangerous. But just because something > > might be misused or abused doesn't prevent us from making a good > > practical use of it, right? > > That's not a given. It's always a trade-off, and if the mechanism is > likely to open up the system to additional risk that's not a good > trade-off even if it helps in some case. I basically worry that this is > the case here. > > > Also, there is LSM on top of all of this to override and control how > > the BPF subsystem is used, regardless of BPF token. It can override > > any of the privileges mechanism, capabilities, BPF token, whatnot. > > If this mechanism needs an LSM to be used safely, that's not incredibly > confidence-inspiring. Security mechanisms should fail safe, which this > one does not. I proposed to add authoritative LSM hooks that would selectively allow some of BPF operations on a case-by-case basis. This was rejected, claiming that the best approach is to give process privilege to do whatever it needs to do and then restrict it with LSM. Ok, if not for user namespaces, that would mean giving application CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN+CAP_SYS_ADMIN, and then restrict it with LSM. Except with user namespace that doesn't work. So that's where BPF token comes in, but allows it to do it more safely by allowing to coarsely tune what subset of BPF operations is granted. And then LSM should be used to further restrict it. > > I'm also worried that an LSM policy is the only way to disable the > ability to create a token; with this in the kernel, I suddenly have to > trust not only that all applications with BPF privileges will not load > malicious code, but also that they won't (accidentally or maliciously) > conveys extra privileges on someone else. Seems a bit broad to have this > ability (to issue tokens) available to everyone with access to the bpf() > syscall, when (IIUC) it's only a single daemon in the system that would > legitimately do this in the deployment you're envisioning. Note, any process with real CAP_SYS_ADMIN. Let's not forget that. But would you feel better if BPF_TOKEN_CREATE was guarded behind sysctl or Kconfig? Ultimately, worrying is fine, but there are real problems that need to be solved. And not doing anything isn't a great option. > > >> If the goal is to enable a privileged application (such as a container > >> manager) to grant another unprivileged application the permission to > >> perform certain bpf() operations, why not just proxy the operations > >> themselves over some RPC mechanism? That way the granting application > > > > It's explicitly what we *do not* want to do, as it is a major problem > > and logistical complication. Every single application will have to be > > rewritten to use such a special daemon/service and its API, which is > > completely different from bpf() syscall API. It invalidates the use of > > all the libbpf (and other bpf libraries') APIs, BPF skeleton is > > incompatible with this. It's a nightmare. I've got feedback from > > people in another company that do have BPF service with just a tiny > > subset of BPF functionality delegated to such service, and it's a pain > > and definitely not a preferred way to do things. > > But weren't you proposing that libbpf should be able to transparently > look for tokens and load them without any application changes? Why can't > libbpf be taught to use an RPC socket in a similar fashion? It basically > boils down to something like: > > static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, > unsigned int size) > { > if (!stat("/run/bpf.sock")) { > sock = open_socket("/run/bpf.sock"); > write_to(sock, cmd, attr, size); > return read_response(sock); > } else { > return syscall(__NR_bpf, cmd, attr, size); > } > } > Well, for one, Meta we'll use its own Thrift-based RPC protocol. Google might use something internal for them using GRPC, someone else would want to utilize systemd, yet others will use yet another implementation. RPC introduces more failure modes. While with syscall we know that operation either succeeded or failed, with RPC we'll have to deal with "maybe", if it was some communication error. Let's not trivialize adding, using, and supporting the RPC version of bpf() syscall. > > Just think about having to mirror a big chunk of bpf() syscall as an > > RPC. So no, BPF proxy is definitely not a good solution. > > The daemon at the other side of the socket in the example above doesn't > *have* to be taught all the semantics of the syscall, it can just look > at the command name and make a decision based on that and the identity > of the socket peer, then just pass the whole thing to the kernel if the > permission check passes. Let's not trivialize the consequences of adding an RPC protocol to all this, please. No matter in what form or shape. > > >> can perform authentication checks on every operation and ensure its > >> origins are sound at the time it is being made. Instead of just writing > >> a blank check (in the form of a token) and hoping the receiver of it is > >> not compromised... > > > > All this could and should be done through LSM in much more decoupled > > and transparent (to application) way. BPF token doesn't prevent this. > > It actually helps with this, because organizations can actually > > dictate that operations that do not provide BPF token are > > automatically rejected, and those that do provide BPF token can be > > further checked and granted or rejected based on specific BPF token > > instance. > > See above re: needing an LSM policy to make this safe... See above. We are talking about the CAP_SYS_ADMIN-enabled process. It's not safe by definition already. > > -Toke