On Fri, May 20, 2022 at 12:43 PM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > Hi Tejun and Yonghong, > > On Fri, May 20, 2022 at 9:45 AM Tejun Heo <tj@xxxxxxxxxx> wrote: > > On Fri, May 20, 2022 at 09:29:43AM -0700, Yonghong Song wrote: > > > Maybe you can have a bpf program signature like below: > > > > > > int BPF_PROG(dump_vmscan, struct bpf_iter_meta *meta, struct cgroup *cgrp, > > > struct cgroup *parent_cgrp) > > > > > > parent_cgrp is NULL when cgrp is the root cgroup. > > > > > > I would like the bpf program should send the following information to > > > user space: > > > <parent cgroup dir name> <current cgroup dir name> > > > > I don't think parent cgroup dir name would be sufficient to reconstruct the > > path given that multiple cgroups in different subtrees can have the same > > name. For live cgroups, userspace can find the path from id (or ino) without > > traversing anything by constructing the fhandle, open it open_by_handle_at() > > and then reading /proc/self/fd/$FD symlink - > > https://lkml.org/lkml/2020/12/2/1126. This isn't available for dead cgroups > > but I'm not sure how much that'd matter given that they aren't visible from > > userspace anyway. > > > > Sending cgroup id is better than cgroup dir name, also because IIUC > the path obtained from cgroup id depends on the namespace of the > userspace process. So if the dump file may be potentially read by > processes within a container, it's better to have the output > namespaced IMO. > > > > <various stats interested by the user> > > > > > > This way, user space can easily construct the cgroup hierarchy stat like > > > cpu mem cpu pressure mem pressure ... > > > cgroup1 ... > > > child1 ... > > > grandchild1 ... > > > child2 ... > > > cgroup 2 ... > > > child 3 ... > > > ... ... > > > > > > the bpf iterator can have additional parameter like > > > cgroup_id = ... to only call bpf program once with that > > > cgroup_id if specified. > > Yep, this should work. We just need to make the cgroup_id parameter > optional. If it is specified when creating bpf_iter_link, we print for > that cgroup only. If it is not specified, we iterate over all cgroups. > If I understand correctly, sounds doable. > > > > The kernel part of cgroup_iter can call cgroup_rstat_flush() > > > before calling cgroup_iter bpf program. > > Sounds good to me as well. But my knowledge on rstat_flush is limited. > Yosry can give this a try. > > > > > Would it work to just pass in @cgrp and provide a group of helpers so that > > the program can do whatever it wanna do including looking up the full path > > and passing that to userspace? > > > > My understanding is, yes, doable. If we need the full path information > of a cgroup, helpers or kfuncs are needed. > > The userspace needs to specify the identity of the cgroup, when > creating bpf_iter. This identity could be cgroup id or fd. This > identity needs to be converted to cgroup object somewhere before > passing into bpf program to use. Let's sum up the discussion here, I feel like we are losing track of the main problem. IIUC the main concern is that cgroup_iter is not effectively an iterator, it rather dumps information for one cgroup. I like the suggestion to make it iterate cgroups by default, and an optional cgroup_id parameter to make it only "iterate" this one cgroup. IIUC, this cgroup_id parameter would be a link parameter, similar to the current approach. Basically, we extend the current patch so that if cgroup_id is not specified the iterator gets called for all cgroups instead of one. This fixes the problem for our use case and also keeps cgroup_iter generic enough. Is my understanding correct? If yes, I don't see a need to flush rstat in the kernel on behalf of cgroup_iter progs.