Re: [PATCH bpf-next v1 3/5] bpf: Introduce cgroup iter

Hao Luo <haoluo@xxxxxxxxxx> · Fri, 20 May 2022 12:42:54 -0700

Hi Tejun and Yonghong,

On Fri, May 20, 2022 at 9:45 AM Tejun Heo <tj@xxxxxxxxxx> wrote:
> On Fri, May 20, 2022 at 09:29:43AM -0700, Yonghong Song wrote:
> > Maybe you can have a bpf program signature like below:
> >
> > int BPF_PROG(dump_vmscan, struct bpf_iter_meta *meta, struct cgroup *cgrp,
> > struct cgroup *parent_cgrp)
> >
> > parent_cgrp is NULL when cgrp is the root cgroup.
> >
> > I would like the bpf program should send the following information to
> > user space:
> >    <parent cgroup dir name> <current cgroup dir name>
>
> I don't think parent cgroup dir name would be sufficient to reconstruct the
> path given that multiple cgroups in different subtrees can have the same
> name. For live cgroups, userspace can find the path from id (or ino) without
> traversing anything by constructing the fhandle, open it open_by_handle_at()
> and then reading /proc/self/fd/$FD symlink -
> https://lkml.org/lkml/2020/12/2/1126. This isn't available for dead cgroups
> but I'm not sure how much that'd matter given that they aren't visible from
> userspace anyway.
>

Sending cgroup id is better than cgroup dir name, also because IIUC
the path obtained from cgroup id depends on the namespace of the
userspace process. So if the dump file may be potentially read by
processes within a container, it's better to have the output
namespaced IMO.

> >    <various stats interested by the user>
> >
> > This way, user space can easily construct the cgroup hierarchy stat like
> >                            cpu   mem   cpu pressure   mem pressure ...
> >    cgroup1                 ...
> >       child1               ...
> >         grandchild1        ...
> >       child2               ...
> >    cgroup 2                ...
> >       child 3              ...
> >         ...                ...
> >
> > the bpf iterator can have additional parameter like
> > cgroup_id = ... to only call bpf program once with that
> > cgroup_id if specified.

Yep, this should work. We just need to make the cgroup_id parameter
optional. If it is specified when creating bpf_iter_link, we print for
that cgroup only. If it is not specified, we iterate over all cgroups.
If I understand correctly, sounds doable.

> > The kernel part of cgroup_iter can call cgroup_rstat_flush()
> > before calling cgroup_iter bpf program.

Sounds good to me as well. But my knowledge on rstat_flush is limited.
Yosry can give this a try.

>
> Would it work to just pass in @cgrp and provide a group of helpers so that
> the program can do whatever it wanna do including looking up the full path
> and passing that to userspace?
>

My understanding is, yes, doable. If we need the full path information
of a cgroup, helpers or kfuncs are needed.

The userspace needs to specify the identity of the cgroup, when
creating bpf_iter. This identity could be cgroup id or fd. This
identity needs to be converted to cgroup object somewhere before
passing into bpf program to use.