On Mon, Apr 03, 2023 at 11:05:25AM +0800, Yafang Shao wrote: > On Mon, Apr 3, 2023 at 7:37 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Tue, Mar 28, 2023 at 11:47:31AM +0800, Yafang Shao wrote: > > > On Tue, Mar 28, 2023 at 3:04 AM Song Liu <song@xxxxxxxxxx> wrote: > > > > > > > > On Sun, Mar 26, 2023 at 2:22 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > > > Currently only CAP_SYS_ADMIN can iterate BPF object IDs and convert IDs > > > > > to FDs, that's intended for BPF's security model[1]. Not only does it > > > > > prevent non-privilidged users from getting other users' bpf program, but > > > > > also it prevents the user from iterating his own bpf objects. > > > > > > > > > > In container environment, some users want to run bpf programs in their > > > > > containers. These users can run their bpf programs under CAP_BPF and > > > > > some other specific CAPs, but they can't inspect their bpf programs in a > > > > > generic way. For example, the bpftool can't be used as it requires > > > > > CAP_SYS_ADMIN. That is very inconvenient. > > > > > > > > Agreed that it is important to enable tools like bpftool without > > > > CAP_SYS_ADMIN. However, I am not sure whether we need a new > > > > namespace for this. Can we reuse some existing namespace for this? > > > > > > > > > > It seems we can't. > > > > Yafang, > > > > It's a Nack. > > > > The only thing you've been trying to "solve" with bpf namespace is to > > allow 'bpftool prog show' iterate progs in the "namespace" without CAP_SYS_ADMIN. > > The concept of bpf namespace is not even close to be thought through. > > Right, it is more likely a PoC in its current state. > > > Others pointed out the gaps in the design. Like bpffs. There are plenty. > > Please do not send patches like this in the future. > > The reason I sent it with an early state is that I want to get some > early feedback from the community ahead of the LSF/MM/BPF workshop, > then I can improve it based on these feedbacks and present it more > specifically at the workshop. Then the discussion will be more > effective. > > > You need to start with describing the problem you want to solve, > > then propose _several_ solutions, describe their pros and cons, > > solicit feedback, present at the conferences (like LSFMMBPF or LPC), > > and when the community agrees that 1. problem is worth solving, > > 2. the solution makes sense, only then work on patches. > > > > I would like to give a short discussion on the BPF namespace if > everything goes fine. Not in this shape of BPF namespace as done in this patch set. We've talked about BPF namespace in the past. This is not it. > > "In container environment, some users want to run bpf programs in their containers." > > is something Song brought up at LSFMMBPF a year ago. > > At that meeting most of the folks agreed that there is a need to run bpf > > in containers and make sure that the effect of bpf prog is limited to a container. > > This new namespace that creates virtual IDs for progs and maps doesn't come > > close in solving this task. > > Currently in our production environment, all the containers running > bpf programs are privileged, that is risky. So actually the goal of > the BPF namespace is to make them (or part of them) non-privileged. > But some of the abilities of these bpf programs will be lost in this > procedure, like the debug-bility with bpftool, so we need to fix it. > Agree with you that this goal is far from making bpf programs safely > running in a container environment. I disagree that allowing admin to run bpftool without sudo is a task worth solving. The visibility of bpf progs in a container is a different task. Without doing any kernel changes we can add a flag to bpftool to let 'bpftool prog show' list progs that were loaded by processes in the same cgroup. bpftool already does prog->pid mapping with bpf iterators. It can filter by cgroup just as well.