Re: [RFC PATCH bpf-next 0/9] bpf: cgroup hierarchical stats collection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have done some significant changes on the BPF side of this. I will
send a RFC V2 soon with those changes and incorporating the feedback
on the cgroup side that I got from Tejun. Hold off on reviewing this
version.


On Mon, May 9, 2022 at 5:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> This patch series allows for using bpf to collect hierarchical cgroup
> stats efficiently by integrating with the rstat framework. The rstat
> framework provides an efficient way to collect cgroup stats and
> propagate them through the cgroup hierarchy.
>
> The last patch is a selftest that demonastrates the entire workflow.
> The workflow consists of:
> - bpf programs that collect per-cpu per-cgroup stats (tracing progs).
> - bpf rstat flusher that contains the logic for aggregating stats
>   across cpus and across the cgroup hierarchy.
> - bpf cgroup_iter responsible for outputting the stats to userspace
>   through reading a file in bpffs.
>
> The first 3 patches include the new bpf rstat flusher program type and
> the needed support in rstat code and libbpf. The rstat flusher program
> is a callback that the rstat framework makes to bpf when a stat flush is
> ongoing, similar to the css_rstat_flush() callback that rstat makes to
> cgroup controllers. Each callback is parameterized by a (cgroup, cpu)
> pair that has been updated. The program contains the logic for
> aggregating the stats across cpus and across the cgroup hierarchy.
> These programs can be attached to any cgroup subsystem, not only the
> ones that implement the css_rstat_flush() callback in the kernel. This
> gives bpf programs more flexibility, and more isolation from the kernel
> implementation.
>
> The following 2 patches add necessary helpers for the stats collection
> workflow. Helpers that call into cgroup_rstat_updated() and
> cgroup_rstat_flush() are added to allow bpf programs collecting stats to
> tell the rstat framework that a cgroup has been updated, and to allow
> bpf programs outputting stats to tell the rstat framework to flush the
> stats before they are displayed to the user. An additional
> bpf_map_lookup_percpu_elem is introduced to allow rstat flusher programs
> to access percpu stats of the cpu being flushed.
>
> The following 3 patches add the cgroup_iter program type (v2). This was
> originally introduced by Hao as a part of a different series [1].
> Their usecase is better showcased as part of this patch series. We also
> make cgroup_get_from_id() cgroup v1 friendly to allow cgroup_iter programs
> to display stats for cgroup v1 as well. This small change makes the
> entire workflow cgroup v1 friendly without any other dedicated changes.
>
> The final patch is a selftest demonstrating the entire workflow with a
> set of bpf programs that collect per-cgroup latency of memcg reclaim.
>
> [1]https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@xxxxxxxxxx/
>
>
> Hao Luo (2):
>   cgroup: Add cgroup_put() in !CONFIG_CGROUPS case
>   bpf: Introduce cgroup iter
>
> Yosry Ahmed (7):
>   bpf: introduce CGROUP_SUBSYS_RSTAT program type
>   cgroup: bpf: flush bpf stats on rstat flush
>   libbpf: Add support for rstat progs and links
>   bpf: add bpf rstat helpers
>   bpf: add bpf_map_lookup_percpu_elem() helper
>   cgroup: add v1 support to cgroup_get_from_id()
>   bpf: add a selftest for cgroup hierarchical stats collection
>
>  include/linux/bpf-cgroup-subsys.h             |  35 ++
>  include/linux/bpf.h                           |   4 +
>  include/linux/bpf_types.h                     |   2 +
>  include/linux/cgroup-defs.h                   |   4 +
>  include/linux/cgroup.h                        |   5 +
>  include/uapi/linux/bpf.h                      |  45 +++
>  kernel/bpf/Makefile                           |   3 +-
>  kernel/bpf/arraymap.c                         |  11 +-
>  kernel/bpf/cgroup_iter.c                      | 148 ++++++++
>  kernel/bpf/cgroup_subsys.c                    | 212 +++++++++++
>  kernel/bpf/hashtab.c                          |  25 +-
>  kernel/bpf/helpers.c                          |  56 +++
>  kernel/bpf/syscall.c                          |   6 +
>  kernel/bpf/verifier.c                         |   6 +
>  kernel/cgroup/cgroup.c                        |  16 +-
>  kernel/cgroup/rstat.c                         |  11 +
>  scripts/bpf_doc.py                            |   2 +
>  tools/include/uapi/linux/bpf.h                |  45 +++
>  tools/lib/bpf/bpf.c                           |   3 +
>  tools/lib/bpf/bpf.h                           |   3 +
>  tools/lib/bpf/libbpf.c                        |  35 ++
>  tools/lib/bpf/libbpf.h                        |   3 +
>  tools/lib/bpf/libbpf.map                      |   1 +
>  .../test_cgroup_hierarchical_stats.c          | 335 ++++++++++++++++++
>  tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
>  .../selftests/bpf/progs/cgroup_vmscan.c       | 211 +++++++++++
>  26 files changed, 1212 insertions(+), 22 deletions(-)
>  create mode 100644 include/linux/bpf-cgroup-subsys.h
>  create mode 100644 kernel/bpf/cgroup_iter.c
>  create mode 100644 kernel/bpf/cgroup_subsys.c
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
>  create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c
>
> --
> 2.36.0.512.ge40c2bad7a-goog
>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux