Re: [RFC PATCH bpf-next] bpf: Allow get bpf object with CAP_BPF

Yafang Shao <laoar.shao@xxxxxxxxx> · Wed, 30 Nov 2022 19:58:51 +0800

On Wed, Nov 30, 2022 at 8:44 AM Hao Luo <haoluo@xxxxxxxxxx> wrote:
>
> On Tue, Nov 29, 2022 at 8:16 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> > In the containerized envriomentation, if a container is not
> > privileged but with CAP_BPF, it is not easy to debug bpf created in this
> > container, let alone using bpftool. Because these bpf objects are
> > invisible if they are not pinned in bpffs. Currently we have to
> > interact with the process which creates these bpf objects to get the
> > information. It may be better if we can control the access to each
> > object the same way as we control the file in bpffs, but now I think we
> > should allow the accessibility of these objects with CAP_BPF.
> >
> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> > ---
> >  kernel/bpf/syscall.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
>
> As far as I can tell, requiring CAP_SYS_ADMIN on iterating IDs and
> converting IDs to FDs is intended and is an important design in BPF's
> security model [1]. So this change does not look good.
>

I understand that allowing ID->FD transition for CAP_SYS_ADMIN only is
for security.
But it also prevents the user from transiting its own bpf object ID,
that is a problem.

> From the commit message, I'm not clear how BPF is debugged in
> containers in your use case. Maybe the debugging process should be
> required to have CAP_SYS_ADMIN?
>

Some container users will run bpf programs in their container,
sometimes they want to check the bpf objects created by themselves  by
using bpftool or read/write the bpf maps with their own tools. But if
the bpf objects are not pinned, the only way to get these bpf objects
is via SCM_RIGHTS.
There should be a general way to get the FD of their own objects when
CAP_BPF is enabled.
With CAP_SYS_ADMIN, the container user can do almost anything, which
is very dangerous.
While with CAP_BPF, the risk can be kept within BPF.

I think we should improve this situation by allowing the user to
transit its own bpf object IDs.
There are some possible solutions,
1. introduce BPF_ID namespace
    Let's use namespace to isolate the bpf object ID instead of
preventing them from reading all IDs.
2. introduce a global sysctl knob to allow users to do the ID->FD transition
    for example, introduce a new value into unprivileged_bpf_disabled.
    -0 Unprivileged calls to ``bpf()`` are enabled
   +0 Unprivileged calls to ``bpf()`` are enabled except the calls
   +  which explicitly requires ``CAP_BPF`` or ``CAP_SYS_ADMIN``
    1 Unprivileged calls to ``bpf()`` are disabled without recovery
    2 Unprivileged calls to ``bpf()`` are disabled
  +3 All unprivileged calls to ``bpf()`` are enabled

WDYT ?

-- 
Regards
Yafang