This patchset introduces a new program type to extend the cgroup interfaces. It extends the bpf filesystem (bpffs) to allow creating a directory hierarchy that tracks any kernfs hierarchy, in particular cgroupfs. Each subdirectory in this hierarchy will keep a reference to a corresponding kernfs node when created. This is done by associating a new data structure (called "tag" in this patchset) to the bpffs directory inodes. File inode in bpffs holds a reference to bpf object and directory inode may point to a tag, which holds a kernfs node. A bpf object can be pinned in these directories and objects can choose to enable inheritance in tagged directories. In this patchset, a new program type "cgroup_view" is introduced, which supports inheritance. More specifically, when a link to cgroup_view prog is pinned in a bpffs directory, it tags the directory and connects the directory to the root cgroup. Subdirectories created underneath has to match a subcgroup, and when created, they will inherit the pinned cgroup_view link from the parent directory. The pinned cgroup_view objects can be read as files. When the object is read, it tries to get the cgroup its parent directory is matched to. Failure to get the cgroup's reference will not run the cgroup_view prog. Users can implement cgroup_view program to define what to print out to the file, given the cgroup object. See patch 5/5 for an example of how this works. Userspace has to manually create/remove directories in bpffs to mirror the cgroup hierarchy. It was suggested using overlayfs to create a hierarchy that contains both cgroupfs and bpffs. But after some experiments, I found overlayfs is not intended to be used this way: overlayfs assumes the underlying filesystem will not change [1], but our underlaying fs (i.e. cgroupfs) will change and cause weird behavior. So I didn't pursue in that direction. This patchset v2 is only for demonstrating the high level design. There are a lot of places in its implementation that can be improved. Cgroup_view is a type of bpf_iter, because seqfile printing has been supported well in bpf_iter, although cgroup_view is not iterating kernel objects. Changes v1->v2: - Complete redesign. v1 implements pinning bpf objects in cgroupfs[2]. v2 implements object inheritance in bpffs. Due to its simplicity, bpffs is better for implementing inheritance compared to cgroupfs. - Extend selftest to include a more realistic use case. The selftests in v2 developed a cgroup-level sched performance metric and exported through the new prog type. [1] https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html#changes-to-underlying-filesystems [2] https://lore.kernel.org/bpf/Ydd1IIUG7%2F3kQRcR@xxxxxxxxxx/ Hao Luo (5): bpf: Bpffs directory tag bpf: Introduce inherit list for dir tag. bpf: cgroup_view iter bpf: Pin cgroup_view selftests/bpf: test for pinning for cgroup_view link include/linux/bpf.h | 2 + kernel/bpf/Makefile | 2 +- kernel/bpf/bpf_iter.c | 11 + kernel/bpf/cgroup_view_iter.c | 114 ++++++++ kernel/bpf/inode.c | 272 +++++++++++++++++- kernel/bpf/inode.h | 55 ++++ .../selftests/bpf/prog_tests/pinning_cgroup.c | 143 +++++++++ tools/testing/selftests/bpf/progs/bpf_iter.h | 7 + .../bpf/progs/bpf_iter_cgroup_view.c | 232 +++++++++++++++ 9 files changed, 829 insertions(+), 9 deletions(-) create mode 100644 kernel/bpf/cgroup_view_iter.c create mode 100644 kernel/bpf/inode.h create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_cgroup.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_cgroup_view.c -- 2.35.0.rc2.247.g8bbb082509-goog