On Wed, Sep 28, 2022 at 1:54 AM Lorenz Bauer <oss@xxxxxx> wrote: > > On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote: > > Hi list, > > > > Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF > > map permission loophole", with slides at [0]. > > I've put this topic on the agenda of the 2022-10-06 BPF office hours to get some maintainer attention. Details are here: https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit > > Best So after the office hours I had an offline whiteboard discussion with Alexei explaining more precisely what I was proposing, and it became apparent that some of the things I was proposing weren't exactly clear, and thus people were left confused about the solution I was talking about. So I'll try to summarize it a bit and add some more specifics. Hopefully that will help, because I still believe we can solve this problem moving forward. But first, two notes. 1) Backporting this is going to be hard, and I don't think that should be the goal, it's going to be too intrusive, probably. 2) It turned out that we currently don't store user-space-side read/write permissions on struct bpf_map itself, and we'd need to do that as a preliminary step here. Below I just assume that struct bpf_map records all the bpf-side and user-side read/write permissions. So, the overall idea is that instead of fetching struct bpf_map point for all kinds of FD-based operations (bpf_obj_get, map_fd_by_id, even bpf_map_create) we are always working with a view of a map, and that "view" is a separate struct/object, something like: struct bpf_map_view { struct bpf_map *map; /* BPF_F_RDONLY | BPF_F_WRONLY, but we can later also add BPF-side flags: BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG */ int access_flags; } So whenever we work with map by FD, we get struct bpf_map_view (i.e., we store struct bpf_map_view inside file->private and inode->i_private). The semantics of view->access_flags is that it is superimposed on top of bpf_map->map_flags (specifically its BPF_F_RDONLY | BPF_F_WRONLY parts, later also BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG). This means that if struct bpf_map is R/W, but our current bpf_map_view says BPF_F_RDONLY, then only read-only access is allowed through that FD. On the other hand, if bpf_map itself is only BPF_F_RDONLY, but we somehow go bpf_map_view with BPF_F_RDONLY | BPF_F_WRONLY (e.g., due to chmod loophole), then it doesn't matter, it's still BPF_F_RDONLY, no write access. We can try preventing such situations in some circumstances, but as we showed with chmod() it's impossible to prevent in general. So, just to hopefully make it a bit clearer, let's discuss a use case that a bunch of people had in mind. Root/CAP_BPF user created R/W bpf_map, but wants to pin it in one BPFFS path as R/W, so that it can be later opened as R/W and modified. This BPFFS path will be restricted through FS permissions to only allow it to be opened by a privileged user/group. But, that same original root/CAP_BPF user would like to also create a read-only BPFFS pinning of that same map, and let unprivileged user(s) to open and work with that map, but only perform read-only operations (e.g., BPF_MAP_LOOKUP_ELEM command). Let's see how that works in this new bpf_map_view model. 1. root/CAP_BPF user does BPF_MAP_CREATE operation, struct bpf_map is created with map_flags BPF_F_RDONLY | BPF_F_WRONLY. Also, immediately struct bpf_map_view is created with same BPF_F_RDONLY | BPF_F_WRONLY access_flags. struct bpf_map_view keeps reference on struct bpf_map, struct bpf_map_view is assigned to struct file->private, new FD is returned to user-space. 2. That same root/CAP_BPF user does BPF_OBJ_PIN and specifies that they want R/W pinning (through file_flags). Kernel clones/copies struct bpf_map_view (I think bpf_map_view shouldn't be shared between files/inodes, each file/inode has its own personal copy; but we can work out the details later), sets (keeps in this case) its access_flags as BPF_F_RDONLY | BPF_F_WRONLY. After that they are free to chown/chmod as necessary to prevent unprivileged user from doing anything with that BPFFS file, if necessary. 3. Now, for the read-only pinning. User does another BPF_OBJ_PIN using original R/W map FD, but now they specify file_flags to only allow BPF_F_RDONLY (I'm too lazy to check what exact flag we pass there to communicate this intent, it's not that important for this discussion). At this point, kernel creates a new struct bpf_map_view, pointing to struct bpf_map, but this time access_flags have only BPF_F_RDONLY set. Then we proceed to creating an inode, its i_private is assigned this new R/O bpf_map_view. The user should chmod/chown pinned BPFFS file appropriately to allow unprivileged users to BPF_OBJ_GET it. Now, let's assume we are unprivileged user who wants to work with that pinned BPF map. When we do BPF_OBJ_GET on that read-only pinned file, kernel fetches struct bpf_map_view from inode->i_private which has access_flags as BPF_F_RDONLY. That's it, there is no way we can do update on that map, kernel will reject that even though struct bpf_map itself allows BPF_F_WRONLY. Note, though, that once we checked everything, as we create a new struct file and return new FD to user-space, that new struct file will have *yet another copy* of struct bpf_map_view, cloned from inode's bpf_map_view (recall that I was proposing to have 1-to-1 mapping between file/inode and bpf_map_view). Let's now assume we are sneaky bastards and chmod that second pinned BPFFS file to allow r/w file permissions. When we do BPF_OBJ_GET, again, we'll fetch struct bpf_map_view which enforce BPF_F_RDONLY (only), despite file itself having writable permissions. We can argue if we should reject such BPF_OBJ_GET command or silently "downgrade" to read-only view, that's beside the point. Hopefully this is a bit clearer. One last note. When we are talking about BPF_OBJ_GET, we are actually going to be dealing with 4 layers of read and write permissions: 1) struct bpf_map's "inherent" permissions 2) struct bpf_map_view's access_flags 3) struct file's FS read/write permissions 4) union bpf_attr's file_flags specified for BPF_OBJ_GET While that's a lot, we always intersect them and keep only the most restrictive combination. So if at any of the layers we have read-only permissions, resulting *new struct bpf_map_view* will only specify BPF_F_RDONLY. E.g., if at layers 1, 2, and 4 we allow BPF_F_WRONLY, but BPFFS file permission (layer #3 above) at that moment is read-only, we should be only getting read-only view of BPF map. P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG as well, it's just that we'll need to define how user will control that. E.g., FS read-only permission, does it restrict both user-space and BPF-view, or just user-space view? We can certainly extend file_flags to allow users to get BPF-side read-only and user-space-side read-write BPF map FD, for example. Obviously, BPF verifier would need to know about struct bpf_map_view when accepting BPF map FD in ldimm64 and such. Alright, that's what I think I had to say. Please point out scenarios that won't be covered by this bpf_map_view model. Thanks.