Re: [RFC PATCH bpf-next] bpf: Allow get bpf object with CAP_BPF

Hao Luo <haoluo@xxxxxxxxxx> · Wed, 30 Nov 2022 16:37:53 -0800

On Wed, Nov 30, 2022 at 10:07 AM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Wed, Nov 30, 2022 at 3:59 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> [...]
> > I understand that allowing ID->FD transition for CAP_SYS_ADMIN only is
> > for security.
> > But it also prevents the user from transiting its own bpf object ID,
> > that is a problem.
> >
> > > From the commit message, I'm not clear how BPF is debugged in
> > > containers in your use case. Maybe the debugging process should be
> > > required to have CAP_SYS_ADMIN?
> > >
> >
> > Some container users will run bpf programs in their container,
> > sometimes they want to check the bpf objects created by themselves  by
> > using bpftool or read/write the bpf maps with their own tools. But if
> > the bpf objects are not pinned, the only way to get these bpf objects
> > is via SCM_RIGHTS.
> > There should be a general way to get the FD of their own objects when
> > CAP_BPF is enabled.
> > With CAP_SYS_ADMIN, the container user can do almost anything, which
> > is very dangerous.
> > While with CAP_BPF, the risk can be kept within BPF.
> >
> > I think we should improve this situation by allowing the user to
> > transit its own bpf object IDs.
> > There are some possible solutions,
> > 1. introduce BPF_ID namespace
> >     Let's use namespace to isolate the bpf object ID instead of
> > preventing them from reading all IDs.
> > 2. introduce a global sysctl knob to allow users to do the ID->FD transition
> >     for example, introduce a new value into unprivileged_bpf_disabled.
> >     -0 Unprivileged calls to ``bpf()`` are enabled
> >    +0 Unprivileged calls to ``bpf()`` are enabled except the calls
> >    +  which explicitly requires ``CAP_BPF`` or ``CAP_SYS_ADMIN``
> >     1 Unprivileged calls to ``bpf()`` are disabled without recovery
> >     2 Unprivileged calls to ``bpf()`` are disabled
> >   +3 All unprivileged calls to ``bpf()`` are enabled
> >
> > WDYT ?
>
> Personally, I think some namespace might be the solution we need.
> But adding a namespace is a lot of work, so we need to make sure to
> do it correctly.
>
> This might be a good topic to discuss in the BPF office hour.
>

I think namespace is more preferable. A discussion in the BPF office
hour sounds good.

Following are my thoughts:

1. What does the BPF_ID namespace look like? Will it be like the PID
namespace, remapping IDs in each namespace? or just restricting the
object IDs visible to the users?

2. What's wrong with passing FD? Is it really necessary to introduce a
namespace for this purpose?

3. IIRC, Song proposed introducing a namespace for BPF isolation, not
just isolating IDs [1]. How does it relate to the BPF_ID namespace?

[1] https://lore.kernel.org/all/CAPhsuW6c17p3XkzSxxo7YBW9LHjqerOqQvt7C1+S--8C9omeng@xxxxxxxxxxxxxx/