Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 11 Sep 2023 12:53:19 -0700

On Sat, Sep 9, 2023 at 8:18 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> On Sat, Sep 9, 2023 at 2:09 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Thu, Sep 7, 2023 at 7:54 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > >
> > > On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@xxxxxxxx> wrote:
> > > >
> > > > Hello Yafang.
> > > >
> > > > On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > > > > In our specific use case, we intend to use bpf_current_under_cgroup() to
> > > > > audit whether the current task resides within specific containers.
> > > >
> > > > I wonder -- how does this work in practice?
> > >
> > > In our practice, the cgroup_array map serves as a shared map utilized
> > > by both our LSM programs and the target pods. as follows,
> > >
> > >     ----------------
> > >     | target pod |
> > >     ----------------
> > >            |
> > >            |
> > >           V                                      ----------------
> > >  /sys/fs/bpf/cgoup_array     <--- | LSM progs|
> > >                                                   ----------------
> > >
> > > Within the target pods, we employ a script to update its cgroup file
> > > descriptor into the cgroup_array, for instance:
> > >
> > >     cgrp_fd = open("/sys/fs/cgroup/cpu");
> > >     cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array");
> > >     bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0);
> > >
> > > Next, we will validate the contents of the cgroup_array within our LSM
> > > programs, as follows:
> > >
> > >      if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx))
> > >             return -1;
> > >
> > > Within our Kubernetes deployment system, we will inject this script
> > > into the target pods only if specific annotations, such as
> > > "bpf_audit," are present. Consequently, users do not need to manually
> > > modify their code; this process will be handled automatically.
> > >
> > > Within our Kubernetes environment, there is only a single instance of
> > > these target pods on each host. Consequently, we can conveniently
> > > utilize the array index as the application ID. However, in scenarios
> > > where you have multiple instances running on a single host, you will
> > > need to manage the mapping of instances to array indexes
> > > independently. For cases with multiple instances, a cgroup_hash may be
> > > a more suitable approach, although that is a separate discussion
> > > altogether.
> >
> > Is there a reason you cannot use bpf_get_current_cgroup_id()
> > to associate task with cgroup in your lsm prog?
>
> Using cgroup_id as the key serves as a temporary workaround;
> nevertheless, employing bpf_get_current_cgroup_id() is impractical due
> to its exclusive support for cgroup2.
>
> To acquire the cgroup_id, we can resort to open coding, as exemplified below:
>
>     task = bpf_get_current_task_btf();
>     cgroups = task->cgroups;
>     cgroup = cgroups->subsys[cpu_cgrp_id]->cgroup;
>     key = cgroup->kn->id;
>
> Nonetheless, creating an open-coded version of
> bpf_get_current_ancestor_cgroup_id() is unfeasible since the BPF
> verifier prohibits access to "cgrp->ancestors[ancestor_level]."

Both helpers can be extended to support v1 or not?
I mean can a task be part of v1 and v2 hierarchy at the same time?
If not then bpf_get_current_cgroup_id() can fallback to what you
describing above and return cgroup_id.
Same would apply to bpf_get_current_ancestor_cgroup_id.

If not, two new kfuncs for v1 could be another option.
prog_array for cgroups is an old design. We can and should do
more flexible interface nowadays.