On 6/2/2023 7:59 AM, Andrii Nakryiko wrote: > *Resending with trimmed CC list because original version didn't make it to > the mailing list.* > > This patch set introduces new BPF object, BPF token, which allows to delegate > a subset of BPF functionality from privileged system-wide daemon (e.g., > systemd or any other container manager) to a *trusted* unprivileged > application. Trust is the key here. This functionality is not about allowing > unconditional unprivileged BPF usage. Establishing trust, though, is > completely up to the discretion of respective privileged application that > would create a BPF token. Token based privilege has a number of well understood weaknesses, none of which I see addressed here. I also have a real problem with the notion of "trusted unprivileged" where trust is established by a user space application. Ignoring the possibility of malicious code for the moment, the opportunity for accidental privilege leakage is huge. It would be trivial (and tempting) to create a privileged BPF "shell" that would then be allowed to "trust" any application and run it with privilege by passing it a token. > > The main motivation for BPF token is a desire to enable containerized > BPF applications to be used together with user namespaces. This is currently > impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced > or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF > helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read > arbitrary memory, and it's impossible to ensure that they only read memory of > processes belonging to any given namespace. This means that it's impossible to > have namespace-aware CAP_BPF capability, and as such another mechanism to > allow safe usage of BPF functionality is necessary. BPF token and delegation > of it to a trusted unprivileged applications is such mechanism. Kernel makes > no assumption about what "trusted" constitutes in any particular case, and > it's up to specific privileged applications and their surrounding > infrastructure to decide that. What kernel provides is a set of APIs to create > and tune BPF token, and pass it around to privileged BPF commands that are > creating new BPF objects like BPF programs, BPF maps, etc. > > Previous attempt at addressing this very same problem ([0]) attempted to > utilize authoritative LSM approach, but was conclusively rejected by upstream > LSM maintainers. BPF token concept is not changing anything about LSM > approach, but can be combined with LSM hooks for very fine-grained security > policy. Some ideas about making BPF token more convenient to use with LSM (in > particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF > 2023 presentation ([1]). E.g., an ability to specify user-provided data > (context), which in combination with BPF LSM would allow implementing a very > dynamic and fine-granular custom security policies on top of BPF token. In the > interest of minimizing API surface area discussions this is going to be > added in follow up patches, as it's not essential to the fundamental concept > of delegatable BPF token. > > It should be noted that BPF token is conceptually quite similar to the idea of > /dev/bpf device file, proposed by Song a while ago ([2]). The biggest > difference is the idea of using virtual anon_inode file to hold BPF token and > allowing multiple independent instances of them, each with its own set of > restrictions. BPF pinning solves the problem of exposing such BPF token > through file system (BPF FS, in this case) for cases where transferring FDs > over Unix domain sockets is not convenient. And also, crucially, BPF token > approach is not using any special stateful task-scoped flags. Instead, bpf() > syscall accepts token_fd parameters explicitly for each relevant BPF command. > This addresses main concerns brought up during the /dev/bpf discussion, and > fits better with overall BPF subsystem design. > > This patch set adds a basic minimum of functionality to make BPF token useful > and to discuss API and functionality. Currently only low-level libbpf APIs > support passing BPF token around, allowing to test kernel functionality, but > for the most part is not sufficient for real-world applications, which > typically use high-level libbpf APIs based on `struct bpf_object` type. This > was done with the intent to limit the size of patch set and concentrate on > mostly kernel-side changes. All the necessary plumbing for libbpf will be sent > as a separate follow up patch set kernel support makes it upstream. > > Another part that should happen once kernel-side BPF token is established, is > a set of conventions between applications (e.g., systemd), tools (e.g., > bpftool), and libraries (e.g., libbpf) about sharing BPF tokens through BPF FS > at well-defined locations to allow applications take advantage of this in > automatic fashion without explicit code changes on BPF application's side. > But I'd like to postpone this discussion to after BPF token concept lands. > > [0] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@xxxxxxxxxx/ > [1] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf > [2] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@xxxxxx/ > > Andrii Nakryiko (18): > bpf: introduce BPF token object > libbpf: add bpf_token_create() API > selftests/bpf: add BPF_TOKEN_CREATE test > bpf: move unprivileged checks into map_create() and bpf_prog_load() > bpf: inline map creation logic in map_create() function > bpf: centralize permissions checks for all BPF map types > bpf: add BPF token support to BPF_MAP_CREATE command > libbpf: add BPF token support to bpf_map_create() API > selftests/bpf: add BPF token-enabled test for BPF_MAP_CREATE command > bpf: add BPF token support to BPF_BTF_LOAD command > libbpf: add BPF token support to bpf_btf_load() API > selftests/bpf: add BPF token-enabled BPF_BTF_LOAD selftest > bpf: keep BPF_PROG_LOAD permission checks clear of validations > bpf: add BPF token support to BPF_PROG_LOAD command > bpf: take into account BPF token when fetching helper protos > bpf: consistenly use BPF token throughout BPF verifier logic > libbpf: add BPF token support to bpf_prog_load() API > selftests/bpf: add BPF token-enabled BPF_PROG_LOAD tests > > drivers/media/rc/bpf-lirc.c | 2 +- > include/linux/bpf.h | 66 ++- > include/linux/filter.h | 2 +- > include/uapi/linux/bpf.h | 74 +++ > kernel/bpf/Makefile | 2 +- > kernel/bpf/arraymap.c | 2 +- > kernel/bpf/bloom_filter.c | 3 - > kernel/bpf/bpf_local_storage.c | 3 - > kernel/bpf/bpf_struct_ops.c | 3 - > kernel/bpf/cgroup.c | 6 +- > kernel/bpf/core.c | 3 +- > kernel/bpf/cpumap.c | 4 - > kernel/bpf/devmap.c | 3 - > kernel/bpf/hashtab.c | 6 - > kernel/bpf/helpers.c | 6 +- > kernel/bpf/inode.c | 26 ++ > kernel/bpf/lpm_trie.c | 3 - > kernel/bpf/queue_stack_maps.c | 4 - > kernel/bpf/reuseport_array.c | 3 - > kernel/bpf/stackmap.c | 3 - > kernel/bpf/syscall.c | 429 ++++++++++++++---- > kernel/bpf/token.c | 141 ++++++ > kernel/bpf/verifier.c | 13 +- > kernel/trace/bpf_trace.c | 2 +- > net/core/filter.c | 36 +- > net/core/sock_map.c | 4 - > net/ipv4/bpf_tcp_ca.c | 2 +- > net/netfilter/nf_bpf_link.c | 2 +- > net/xdp/xskmap.c | 4 - > tools/include/uapi/linux/bpf.h | 74 +++ > tools/lib/bpf/bpf.c | 32 +- > tools/lib/bpf/bpf.h | 24 +- > tools/lib/bpf/libbpf.map | 1 + > .../selftests/bpf/prog_tests/libbpf_probes.c | 4 + > .../selftests/bpf/prog_tests/libbpf_str.c | 6 + > .../testing/selftests/bpf/prog_tests/token.c | 282 ++++++++++++ > .../bpf/prog_tests/unpriv_bpf_disabled.c | 6 +- > 37 files changed, 1098 insertions(+), 188 deletions(-) > create mode 100644 kernel/bpf/token.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c >