On Wed, Apr 5, 2023 at 7:55 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > It seems that I didn't describe the issue clearly. > The container doesn't have CAP_SYS_ADMIN, but the CAP_SYS_ADMIN is > required to run bpftool, so the bpftool running in the container > can't get the ID of bpf objects or convert IDs to FDs. > Is there something that I missed ? Nothing. This is by design. bpftool needs sudo. That's all. > > > > --- a/kernel/bpf/syscall.c > > > +++ b/kernel/bpf/syscall.c > > > @@ -3705,9 +3705,6 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr, > > > if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX) > > > return -EINVAL; > > > > > > - if (!capable(CAP_SYS_ADMIN)) > > > - return -EPERM; > > > - > > > next_id++; > > > spin_lock_bh(lock); > > > if (!idr_get_next(idr, &next_id)) > > > > > > Because the container doesn't have CAP_SYS_ADMIN enabled, while they > > > only have CAP_BPF and other required CAPs. > > > > > > Another possible solution is that we run an agent in the host, and the > > > user in the container who wants to get the bpf objects info in his > > > container should send a request to this agent via unix domain socket. > > > That is what we are doing now in our production environment. That > > > said, each container has to run a client to get the bpf object fd. > > > > None of such hacks are necessary. People that debug bpf setups with bpftool > > can always sudo. > > > > > There are some downsides, > > > - It can't handle pinned bpf programs > > > For pinned programs, the user can get them from the pinned files > > > directly, so he can use bpftool in his case, only with some > > > complaints. > > > - If the user attached the bpf prog, and then removed the pinned > > > file, but didn't detach it. > > > That happened. But this error case can't be handled. > > > - There may be other corner cases that it can't fit. > > > > > > There's a solution to improve it, but we also need to change the > > > kernel. That is, we can use the wasted space btf->name. > > > > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c > > > index b7e5a55..59d73a3 100644 > > > --- a/kernel/bpf/btf.c > > > +++ b/kernel/bpf/btf.c > > > @@ -5542,6 +5542,8 @@ static struct btf *btf_parse(bpfptr_t btf_data, > > > u32 btf_data_size, > > > err = -ENOMEM; > > > goto errout; > > > } > > > + snprintf(btf->name, sizeof(btf->name), "%s-%d-%d", current->comm, > > > + current->pid, cgroup_id(task_cgroup(p, cpu_cgrp_id))); > > > > Unnecessary. > > comm, pid, cgroup can be printed by bpftool without changing the kernel. > > Some questions, > - What if the process exits after attaching the bpf prog and the prog > is not auto-detachable? > For example, the reuserport bpf prog is not auto-detachable. After > pins the reuserport bpf prog, a task can attach it through the pinned > bpf file, but if the task forgets to detach it and the pinned file is > removed, then it seems there's no way to figure out which task or > cgroup this prog belongs to... you're saying that there is a bpf prog in the kernel without corresponding user space ? Meaning no user space process has an FD that points to this prog or FD to a map that this prog is using? In such a case this is truly kernel bpf prog. It doesn't belong to cgroup. > - Could you pls. explain in detail how to get comm, pid, or cgroup > from a pinned bpffs file? pinned bpf prog and no user space holds FD to it? It's not part of any cgroup. Nothing to print.