I have done some significant changes on the BPF side of this. I will send a RFC V2 soon with those changes and incorporating the feedback on the cgroup side that I got from Tejun. Hold off on reviewing this version. On Mon, May 9, 2022 at 5:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > This patch series allows for using bpf to collect hierarchical cgroup > stats efficiently by integrating with the rstat framework. The rstat > framework provides an efficient way to collect cgroup stats and > propagate them through the cgroup hierarchy. > > The last patch is a selftest that demonastrates the entire workflow. > The workflow consists of: > - bpf programs that collect per-cpu per-cgroup stats (tracing progs). > - bpf rstat flusher that contains the logic for aggregating stats > across cpus and across the cgroup hierarchy. > - bpf cgroup_iter responsible for outputting the stats to userspace > through reading a file in bpffs. > > The first 3 patches include the new bpf rstat flusher program type and > the needed support in rstat code and libbpf. The rstat flusher program > is a callback that the rstat framework makes to bpf when a stat flush is > ongoing, similar to the css_rstat_flush() callback that rstat makes to > cgroup controllers. Each callback is parameterized by a (cgroup, cpu) > pair that has been updated. The program contains the logic for > aggregating the stats across cpus and across the cgroup hierarchy. > These programs can be attached to any cgroup subsystem, not only the > ones that implement the css_rstat_flush() callback in the kernel. This > gives bpf programs more flexibility, and more isolation from the kernel > implementation. > > The following 2 patches add necessary helpers for the stats collection > workflow. Helpers that call into cgroup_rstat_updated() and > cgroup_rstat_flush() are added to allow bpf programs collecting stats to > tell the rstat framework that a cgroup has been updated, and to allow > bpf programs outputting stats to tell the rstat framework to flush the > stats before they are displayed to the user. An additional > bpf_map_lookup_percpu_elem is introduced to allow rstat flusher programs > to access percpu stats of the cpu being flushed. > > The following 3 patches add the cgroup_iter program type (v2). This was > originally introduced by Hao as a part of a different series [1]. > Their usecase is better showcased as part of this patch series. We also > make cgroup_get_from_id() cgroup v1 friendly to allow cgroup_iter programs > to display stats for cgroup v1 as well. This small change makes the > entire workflow cgroup v1 friendly without any other dedicated changes. > > The final patch is a selftest demonstrating the entire workflow with a > set of bpf programs that collect per-cgroup latency of memcg reclaim. > > [1]https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@xxxxxxxxxx/ > > > Hao Luo (2): > cgroup: Add cgroup_put() in !CONFIG_CGROUPS case > bpf: Introduce cgroup iter > > Yosry Ahmed (7): > bpf: introduce CGROUP_SUBSYS_RSTAT program type > cgroup: bpf: flush bpf stats on rstat flush > libbpf: Add support for rstat progs and links > bpf: add bpf rstat helpers > bpf: add bpf_map_lookup_percpu_elem() helper > cgroup: add v1 support to cgroup_get_from_id() > bpf: add a selftest for cgroup hierarchical stats collection > > include/linux/bpf-cgroup-subsys.h | 35 ++ > include/linux/bpf.h | 4 + > include/linux/bpf_types.h | 2 + > include/linux/cgroup-defs.h | 4 + > include/linux/cgroup.h | 5 + > include/uapi/linux/bpf.h | 45 +++ > kernel/bpf/Makefile | 3 +- > kernel/bpf/arraymap.c | 11 +- > kernel/bpf/cgroup_iter.c | 148 ++++++++ > kernel/bpf/cgroup_subsys.c | 212 +++++++++++ > kernel/bpf/hashtab.c | 25 +- > kernel/bpf/helpers.c | 56 +++ > kernel/bpf/syscall.c | 6 + > kernel/bpf/verifier.c | 6 + > kernel/cgroup/cgroup.c | 16 +- > kernel/cgroup/rstat.c | 11 + > scripts/bpf_doc.py | 2 + > tools/include/uapi/linux/bpf.h | 45 +++ > tools/lib/bpf/bpf.c | 3 + > tools/lib/bpf/bpf.h | 3 + > tools/lib/bpf/libbpf.c | 35 ++ > tools/lib/bpf/libbpf.h | 3 + > tools/lib/bpf/libbpf.map | 1 + > .../test_cgroup_hierarchical_stats.c | 335 ++++++++++++++++++ > tools/testing/selftests/bpf/progs/bpf_iter.h | 7 + > .../selftests/bpf/progs/cgroup_vmscan.c | 211 +++++++++++ > 26 files changed, 1212 insertions(+), 22 deletions(-) > create mode 100644 include/linux/bpf-cgroup-subsys.h > create mode 100644 kernel/bpf/cgroup_iter.c > create mode 100644 kernel/bpf/cgroup_subsys.c > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c > create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c > > -- > 2.36.0.512.ge40c2bad7a-goog >