On 7/9/24 9:39 PM, Waiman Long wrote: > On 7/9/24 11:58, Kamalesh Babulal wrote: >> >> On 7/9/24 6:58 PM, Waiman Long wrote: >>> The /proc/cgroups file shows the number of cgroups for each of the >>> subsystems. With cgroup v1, the number of CSSes is the same as the >>> number of cgroups. That is not the case anymore with cgroup v2. The >>> /proc/cgroups file cannot show the actual number of CSSes for the >>> subsystems that are bound to cgroup v2. >>> >>> So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup), >>> we can't tell by looking at /proc/cgroups which cgroup subsystems may be >>> responsible. This patch adds CSS counts in the cgroup_subsys structure >>> to keep track of the number of CSSes for each of the cgroup subsystems. >>> >>> As cgroup v2 had deprecated the use of /proc/cgroups, the root >>> cgroup.stat file is extended to show the number of outstanding CSSes >>> associated with all the non-inhibited cgroup subsystems that have been >>> bound to cgroup v2. This will help us pinpoint which subsystems may be >>> responsible for the increasing number of dying (nr_dying_descendants) >>> cgroups. >>> >>> The cgroup-v2.rst file is updated to discuss this new behavior. >>> >>> With this patch applied, a sample output from root cgroup.stat file >>> was shown below. >>> >>> nr_descendants 53 >>> nr_dying_descendants 34 >>> nr_cpuset 1 >>> nr_cpu 40 >>> nr_io 40 >>> nr_memory 87 >>> nr_perf_event 54 >>> nr_hugetlb 1 >>> nr_pids 53 >>> nr_rdma 1 >>> nr_misc 1 >>> >>> In this particular case, it can be seen that memory cgroup is the most >>> likely culprit for causing the 34 dying cgroups. >>> >>> Signed-off-by: Waiman Long <longman@xxxxxxxxxx> [...] >>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c >>> index c8e4b62b436a..48eba2737b1a 100644 >>> --- a/kernel/cgroup/cgroup.c >>> +++ b/kernel/cgroup/cgroup.c >>> @@ -3669,12 +3669,27 @@ static int cgroup_events_show(struct seq_file *seq, void *v) >>> static int cgroup_stat_show(struct seq_file *seq, void *v) >>> { >>> struct cgroup *cgroup = seq_css(seq)->cgroup; >>> + struct cgroup_subsys *ss; >>> + int i; >>> seq_printf(seq, "nr_descendants %d\n", >>> cgroup->nr_descendants); >>> seq_printf(seq, "nr_dying_descendants %d\n", >>> cgroup->nr_dying_descendants); >>> + if (cgroup_parent(cgroup)) >>> + return 0; >>> + >>> + /* >>> + * For the root cgroup, shows the number of csses associated >>> + * with each of non-inhibited cgroup subsystems bound to it. >>> + */ >>> + do_each_subsys_mask(ss, i, ~cgrp_dfl_inhibit_ss_mask) { >>> + if (ss->root != &cgrp_dfl_root) >>> + continue; >>> + seq_printf(seq, "nr_%s %d\n", ss->name, >>> + atomic_read(&ss->nr_csses)); >>> + } while_each_subsys_mask(); >>> return 0; >>> } >>> >> Thanks for adding nr_csses, the patch looks good to me. A preference comment, >> nr_<subsys>_css format, makes it easier to interpret the count. >> >> With or without the changes to the cgroup subsys format: >> >> Reviewed-by: Kamalesh Babulal <kamalesh.babulal@xxxxxxxxxx> > > Thanks for the review. > > CSS is a kernel internal name for cgroup subsystem state. Non kernel developers or users may not know what CSS is and cgroup-v2.rst doesn't mention CSS at all. So I don't think it is a good idea to add the "_css" suffix. From the user point of view, the proper term to use here is the number of cgroups, just like what "nr_descendants" and "nr_dying_descendants" are referring to before this patch. The only issue that I didn't address is the use of the proper plural form which is hard for cgroup subsystem names that we have. Agreed, css might not be a familiar term to many outside kernel development, though it has been introduced to an extent in cgroup-v1 documentation. nr_<subsys> count is the sum of all subsys state objects, and suffix "_css" sounded preferable to me, but might not be so intuitive to the user. nr_<subsys>_cgrps was the other suffix, I thought of, but it was more redundant when read from cgroups.stat file. -- Thanks, Kamalesh