On Wed, Nov 30, 2022 at 8:44 AM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > On Tue, Nov 29, 2022 at 8:16 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > In the containerized envriomentation, if a container is not > > privileged but with CAP_BPF, it is not easy to debug bpf created in this > > container, let alone using bpftool. Because these bpf objects are > > invisible if they are not pinned in bpffs. Currently we have to > > interact with the process which creates these bpf objects to get the > > information. It may be better if we can control the access to each > > object the same way as we control the file in bpffs, but now I think we > > should allow the accessibility of these objects with CAP_BPF. > > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > --- > > kernel/bpf/syscall.c | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > As far as I can tell, requiring CAP_SYS_ADMIN on iterating IDs and > converting IDs to FDs is intended and is an important design in BPF's > security model [1]. So this change does not look good. > I understand that allowing ID->FD transition for CAP_SYS_ADMIN only is for security. But it also prevents the user from transiting its own bpf object ID, that is a problem. > From the commit message, I'm not clear how BPF is debugged in > containers in your use case. Maybe the debugging process should be > required to have CAP_SYS_ADMIN? > Some container users will run bpf programs in their container, sometimes they want to check the bpf objects created by themselves by using bpftool or read/write the bpf maps with their own tools. But if the bpf objects are not pinned, the only way to get these bpf objects is via SCM_RIGHTS. There should be a general way to get the FD of their own objects when CAP_BPF is enabled. With CAP_SYS_ADMIN, the container user can do almost anything, which is very dangerous. While with CAP_BPF, the risk can be kept within BPF. I think we should improve this situation by allowing the user to transit its own bpf object IDs. There are some possible solutions, 1. introduce BPF_ID namespace Let's use namespace to isolate the bpf object ID instead of preventing them from reading all IDs. 2. introduce a global sysctl knob to allow users to do the ID->FD transition for example, introduce a new value into unprivileged_bpf_disabled. -0 Unprivileged calls to ``bpf()`` are enabled +0 Unprivileged calls to ``bpf()`` are enabled except the calls + which explicitly requires ``CAP_BPF`` or ``CAP_SYS_ADMIN`` 1 Unprivileged calls to ``bpf()`` are disabled without recovery 2 Unprivileged calls to ``bpf()`` are disabled +3 All unprivileged calls to ``bpf()`` are enabled WDYT ? -- Regards Yafang