On 7/9/24 6:58 PM, Waiman Long wrote: > The /proc/cgroups file shows the number of cgroups for each of the > subsystems. With cgroup v1, the number of CSSes is the same as the > number of cgroups. That is not the case anymore with cgroup v2. The > /proc/cgroups file cannot show the actual number of CSSes for the > subsystems that are bound to cgroup v2. > > So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup), > we can't tell by looking at /proc/cgroups which cgroup subsystems may be > responsible. This patch adds CSS counts in the cgroup_subsys structure > to keep track of the number of CSSes for each of the cgroup subsystems. > > As cgroup v2 had deprecated the use of /proc/cgroups, the root > cgroup.stat file is extended to show the number of outstanding CSSes > associated with all the non-inhibited cgroup subsystems that have been > bound to cgroup v2. This will help us pinpoint which subsystems may be > responsible for the increasing number of dying (nr_dying_descendants) > cgroups. > > The cgroup-v2.rst file is updated to discuss this new behavior. > > With this patch applied, a sample output from root cgroup.stat file > was shown below. > > nr_descendants 53 > nr_dying_descendants 34 > nr_cpuset 1 > nr_cpu 40 > nr_io 40 > nr_memory 87 > nr_perf_event 54 > nr_hugetlb 1 > nr_pids 53 > nr_rdma 1 > nr_misc 1 > > In this particular case, it can be seen that memory cgroup is the most > likely culprit for causing the 34 dying cgroups. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > --- > Documentation/admin-guide/cgroup-v2.rst | 10 ++++++++-- > include/linux/cgroup-defs.h | 3 +++ > kernel/cgroup/cgroup.c | 19 +++++++++++++++++++ > 3 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 52763d6b2919..65af2f30196f 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -981,6 +981,12 @@ All cgroup core files are prefixed with "cgroup." > A dying cgroup can consume system resources not exceeding > limits, which were active at the moment of cgroup deletion. > > + nr_<cgroup_subsys> > + Total number of cgroups associated with that cgroup > + subsystem, e.g. cpuset or memory. These cgroup counts > + will only be shown in the root cgroup and for subsystems > + bound to cgroup v2. > + > cgroup.freeze > A read-write single value file which exists on non-root cgroups. > Allowed values are "0" and "1". The default is "0". > @@ -2930,8 +2936,8 @@ Deprecated v1 Core Features > > - "cgroup.clone_children" is removed. > > -- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file > - at the root instead. > +- /proc/cgroups is meaningless for v2. Use "cgroup.controllers" or > + "cgroup.stat" files at the root instead. > > > Issues with v1 and Rationales for v2 > diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > index b36690ca0d3f..522ab77f0406 100644 > --- a/include/linux/cgroup-defs.h > +++ b/include/linux/cgroup-defs.h > @@ -776,6 +776,9 @@ struct cgroup_subsys { > * specifies the mask of subsystems that this one depends on. > */ > unsigned int depends_on; > + > + /* Number of CSSes, used only for /proc/cgroups */ > + atomic_t nr_csses; > }; > > extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem; > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index c8e4b62b436a..48eba2737b1a 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -3669,12 +3669,27 @@ static int cgroup_events_show(struct seq_file *seq, void *v) > static int cgroup_stat_show(struct seq_file *seq, void *v) > { > struct cgroup *cgroup = seq_css(seq)->cgroup; > + struct cgroup_subsys *ss; > + int i; > > seq_printf(seq, "nr_descendants %d\n", > cgroup->nr_descendants); > seq_printf(seq, "nr_dying_descendants %d\n", > cgroup->nr_dying_descendants); > > + if (cgroup_parent(cgroup)) > + return 0; > + > + /* > + * For the root cgroup, shows the number of csses associated > + * with each of non-inhibited cgroup subsystems bound to it. > + */ > + do_each_subsys_mask(ss, i, ~cgrp_dfl_inhibit_ss_mask) { > + if (ss->root != &cgrp_dfl_root) > + continue; > + seq_printf(seq, "nr_%s %d\n", ss->name, > + atomic_read(&ss->nr_csses)); > + } while_each_subsys_mask(); > return 0; > } > Thanks for adding nr_csses, the patch looks good to me. A preference comment, nr_<subsys>_css format, makes it easier to interpret the count. With or without the changes to the cgroup subsys format: Reviewed-by: Kamalesh Babulal <kamalesh.babulal@xxxxxxxxxx> -- Thanks, Kamalesh