On Fri, Jun 9, 2023 at 11:32 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Wed, Jun 7, 2023, at 4:53 PM, Andrii Nakryiko wrote: > > This patch set introduces new BPF object, BPF token, which allows to delegate > > a subset of BPF functionality from privileged system-wide daemon (e.g., > > systemd or any other container manager) to a *trusted* unprivileged > > application. Trust is the key here. This functionality is not about allowing > > unconditional unprivileged BPF usage. Establishing trust, though, is > > completely up to the discretion of respective privileged application that > > would create a BPF token. > > > > I skimmed the description and the LSFMM slides. > > Years ago, I sent out a patch set to start down the path of making the bpf() API make sense when used in less-privileged contexts (regarding access control of BPF objects and such). It went nowhere. > > Where does BPF token fit in? Does a kernel with these patches applied actually behave sensibly if you pass a BPF token into a container? Yes?.. In the sense that it is possible to create BPF programs and BPF maps from inside the container (with BPF token). Right now under user namespace it's impossible no matter what you do. > Giving a way to enable BPF in a container is only a small part of the overall task -- making BPF behave sensibly in that container seems like it should also be necessary. BPF is still a privileged thing. You can't just say that any unprivileged application should be able to use BPF. That's why BPF token is about trusting unpriv application in a controlled environment (production) to not do something crazy. It can be enforced further through LSM usage, but in a lot of cases, when dealing with internal production applications it's enough to have a proper application design and rely on code review process to avoid any negative effects. So privileged daemon (container manager) will be configured with the knowledge of which services/containers are allowed to use BPF, and will grant BPF token only to those that were explicitly allowlisted.