Re: [RFC PATCH bpf-next 0/5] bpf, cgroup: Enable cgroup_array map on cgroup1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hello Yafang.
>
> On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> > In our specific use case, we intend to use bpf_current_under_cgroup() to
> > audit whether the current task resides within specific containers.
>
> I wonder -- how does this work in practice?

In our practice, the cgroup_array map serves as a shared map utilized
by both our LSM programs and the target pods. as follows,

    ----------------
    | target pod |
    ----------------
           |
           |
          V                                      ----------------
 /sys/fs/bpf/cgoup_array     <--- | LSM progs|
                                                  ----------------

Within the target pods, we employ a script to update its cgroup file
descriptor into the cgroup_array, for instance:

    cgrp_fd = open("/sys/fs/cgroup/cpu");
    cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array");
    bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0);

Next, we will validate the contents of the cgroup_array within our LSM
programs, as follows:

     if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx))
            return -1;

Within our Kubernetes deployment system, we will inject this script
into the target pods only if specific annotations, such as
"bpf_audit," are present. Consequently, users do not need to manually
modify their code; this process will be handled automatically.

Within our Kubernetes environment, there is only a single instance of
these target pods on each host. Consequently, we can conveniently
utilize the array index as the application ID. However, in scenarios
where you have multiple instances running on a single host, you will
need to manage the mapping of instances to array indexes
independently. For cases with multiple instances, a cgroup_hash may be
a more suitable approach, although that is a separate discussion
altogether.

>
> If it's systemd hybrid setup, you can get the information from the
> unified hierarchy which represents the container membership.
>
> If it's a setup without the unified hierarchy, you have to pick one
> hieararchy as a representation of the membership. Which one will it be?

We utilize the CPU subsystem, and all of our pods have this cgroup
subsystem enabled.

>
> > Subsequently, we can use this information to create distinct ACLs within
> > our LSM BPF programs, enabling us to control specific operations performed
> > by these tasks.
>
> If one was serious about container-based ACLs, it'd be best to have a
> dedicated and maintained hierarchy for this (I mean a named v1
> hiearchy). But your implementation omits this, so this hints to me that
> this scenario may already be better covered with querying the unified
> hierarchy.
>
> > Considering the widespread use of cgroup1 in container environments,
> > coupled with the considerable time it will take to transition to cgroup2,
> > implementing this change will significantly enhance the utility of BPF
> > in container scenarios.
>
> If a change like this is not accepted, will it make the transition
> period shorter? (As written above, the unified hierarchy seems a better
> fit for your use case.)

If that change is not accepted by upstream, we will need to
independently manage and maintain it within our local kernel :(

-- 
Regards
Yafang




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux