On Thu, Sep 7, 2023 at 10:41 PM Michal Koutný <mkoutny@xxxxxxxx> wrote: > > Hello Yafang. > > On Sun, Sep 03, 2023 at 02:27:55PM +0000, Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > In our specific use case, we intend to use bpf_current_under_cgroup() to > > audit whether the current task resides within specific containers. > > I wonder -- how does this work in practice? In our practice, the cgroup_array map serves as a shared map utilized by both our LSM programs and the target pods. as follows, ---------------- | target pod | ---------------- | | V ---------------- /sys/fs/bpf/cgoup_array <--- | LSM progs| ---------------- Within the target pods, we employ a script to update its cgroup file descriptor into the cgroup_array, for instance: cgrp_fd = open("/sys/fs/cgroup/cpu"); cgrp_map_fd = bpf_obj_get("/sys/fs/bpf/cgroup_array"); bpf_map_update_elem(cgrp_map_fd, &app_idx, &cgrp_fd, 0); Next, we will validate the contents of the cgroup_array within our LSM programs, as follows: if (!bpf_current_task_under_cgroup(&cgroup_array, app_idx)) return -1; Within our Kubernetes deployment system, we will inject this script into the target pods only if specific annotations, such as "bpf_audit," are present. Consequently, users do not need to manually modify their code; this process will be handled automatically. Within our Kubernetes environment, there is only a single instance of these target pods on each host. Consequently, we can conveniently utilize the array index as the application ID. However, in scenarios where you have multiple instances running on a single host, you will need to manage the mapping of instances to array indexes independently. For cases with multiple instances, a cgroup_hash may be a more suitable approach, although that is a separate discussion altogether. > > If it's systemd hybrid setup, you can get the information from the > unified hierarchy which represents the container membership. > > If it's a setup without the unified hierarchy, you have to pick one > hieararchy as a representation of the membership. Which one will it be? We utilize the CPU subsystem, and all of our pods have this cgroup subsystem enabled. > > > Subsequently, we can use this information to create distinct ACLs within > > our LSM BPF programs, enabling us to control specific operations performed > > by these tasks. > > If one was serious about container-based ACLs, it'd be best to have a > dedicated and maintained hierarchy for this (I mean a named v1 > hiearchy). But your implementation omits this, so this hints to me that > this scenario may already be better covered with querying the unified > hierarchy. > > > Considering the widespread use of cgroup1 in container environments, > > coupled with the considerable time it will take to transition to cgroup2, > > implementing this change will significantly enhance the utility of BPF > > in container scenarios. > > If a change like this is not accepted, will it make the transition > period shorter? (As written above, the unified hierarchy seems a better > fit for your use case.) If that change is not accepted by upstream, we will need to independently manage and maintain it within our local kernel :( -- Regards Yafang