On Wed, Nov 30, 2022 at 3:59 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > [...] > I understand that allowing ID->FD transition for CAP_SYS_ADMIN only is > for security. > But it also prevents the user from transiting its own bpf object ID, > that is a problem. > > > From the commit message, I'm not clear how BPF is debugged in > > containers in your use case. Maybe the debugging process should be > > required to have CAP_SYS_ADMIN? > > > > Some container users will run bpf programs in their container, > sometimes they want to check the bpf objects created by themselves by > using bpftool or read/write the bpf maps with their own tools. But if > the bpf objects are not pinned, the only way to get these bpf objects > is via SCM_RIGHTS. > There should be a general way to get the FD of their own objects when > CAP_BPF is enabled. > With CAP_SYS_ADMIN, the container user can do almost anything, which > is very dangerous. > While with CAP_BPF, the risk can be kept within BPF. > > I think we should improve this situation by allowing the user to > transit its own bpf object IDs. > There are some possible solutions, > 1. introduce BPF_ID namespace > Let's use namespace to isolate the bpf object ID instead of > preventing them from reading all IDs. > 2. introduce a global sysctl knob to allow users to do the ID->FD transition > for example, introduce a new value into unprivileged_bpf_disabled. > -0 Unprivileged calls to ``bpf()`` are enabled > +0 Unprivileged calls to ``bpf()`` are enabled except the calls > + which explicitly requires ``CAP_BPF`` or ``CAP_SYS_ADMIN`` > 1 Unprivileged calls to ``bpf()`` are disabled without recovery > 2 Unprivileged calls to ``bpf()`` are disabled > +3 All unprivileged calls to ``bpf()`` are enabled > > WDYT ? Personally, I think some namespace might be the solution we need. But adding a namespace is a lot of work, so we need to make sure to do it correctly. This might be a good topic to discuss in the BPF office hour. Thanks, Song