Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 6 Apr 2023 11:22:01 +0800

On Thu, Apr 6, 2023 at 11:06 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Apr 5, 2023 at 7:55 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > It seems that I didn't describe the issue clearly.
> > The container doesn't have CAP_SYS_ADMIN, but the CAP_SYS_ADMIN is
> > required to run bpftool,  so the bpftool running in the container
> > can't get the ID of bpf objects or convert IDs to FDs.
> > Is there something that I missed ?
>
> Nothing. This is by design. bpftool needs sudo. That's all.
>

Hmm, what I'm trying to do is make bpftool run without sudo.

>
> >
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -3705,9 +3705,6 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr,
> > > >         if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX)
> > > >                 return -EINVAL;
> > > >
> > > > -       if (!capable(CAP_SYS_ADMIN))
> > > > -               return -EPERM;
> > > > -
> > > >         next_id++;
> > > >         spin_lock_bh(lock);
> > > >         if (!idr_get_next(idr, &next_id))
> > > >
> > > > Because the container doesn't have CAP_SYS_ADMIN enabled, while they
> > > > only have CAP_BPF and other required CAPs.
> > > >
> > > > Another possible solution is that we run an agent in the host, and the
> > > > user in the container who wants to get the bpf objects info in his
> > > > container should send a request to this agent via unix domain socket.
> > > > That is what we are doing now in our production environment.  That
> > > > said, each container has to run a client to get the bpf object fd.
> > >
> > > None of such hacks are necessary. People that debug bpf setups with bpftool
> > > can always sudo.
> > >
> > > > There are some downsides,
> > > > -  It can't handle pinned bpf programs
> > > >    For pinned programs, the user can get them from the pinned files
> > > > directly, so he can use bpftool in his case, only with some
> > > > complaints.
> > > > -  If the user attached the bpf prog, and then removed the pinned
> > > > file, but didn't detach it.
> > > >    That happened. But this error case can't be handled.
> > > > - There may be other corner cases that it can't fit.
> > > >
> > > > There's a solution to improve it, but we also need to change the
> > > > kernel. That is, we can use the wasted space btf->name.
> > > >
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index b7e5a55..59d73a3 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -5542,6 +5542,8 @@ static struct btf *btf_parse(bpfptr_t btf_data,
> > > > u32 btf_data_size,
> > > >                 err = -ENOMEM;
> > > >                 goto errout;
> > > >         }
> > > > +       snprintf(btf->name, sizeof(btf->name), "%s-%d-%d", current->comm,
> > > > +                        current->pid, cgroup_id(task_cgroup(p, cpu_cgrp_id)));
> > >
> > > Unnecessary.
> > > comm, pid, cgroup can be printed by bpftool without changing the kernel.
> >
> > Some questions,
> > - What if the process exits after attaching the bpf prog and the prog
> > is not auto-detachable?
> >   For example, the reuserport bpf prog is not auto-detachable. After
> > pins the reuserport bpf prog, a task can attach it through the pinned
> > bpf file, but if the task forgets to detach it and the pinned file is
> > removed, then it seems there's no way to figure out which task or
> > cgroup this prog belongs to...
>
> you're saying that there is a bpf prog in the kernel without
> corresponding user space ?

No, it is corresponding to user space. For example, it may be
corresponding to a socket fd, or a cgroup fd.

> Meaning no user space process has an FD
> that points to this prog or FD to a map that this prog is using?
> In such a case this is truly kernel bpf prog. It doesn't belong to cgroup.
>

Even if it is kernel bpf prog, it is created by a process. The user
needs to know which one created it.

> > - Could you pls. explain in detail how to get comm, pid, or cgroup
> > from a pinned bpffs file?
>
> pinned bpf prog and no user space holds FD to it?
> It's not part of any cgroup. Nothing to print.

As I explained above, even if it holds nothing, the user needs to know
the information from it. For example, if it is expected, which one
created it?

-- 
Regards
Yafang