On Mon, Jul 22, 2024 at 3:53 PM Shakeel Butt <shakeel.butt@xxxxxxxxx> wrote: > > Linux kernel does not expose memory.current on the root memcg and there > are applications which have to traverse all the top level memcgs to > calculate the total memory charged in the system. This is more expensive > (directory traversal and multiple open and reads) and is racy on a busy > machine. As the kernel already have the needed information i.e. root's > memory.current, why not expose that? > > However root's memory.current will have a different semantics than the > non-root's memory.current as the kernel skips the charging for root, so > maybe it is better to have a different named interface for the root. > Something like memory.children_usage only for root memcg. > > Now there is still a question that why the kernel does not expose > memory.current for the root. The historical reason was that the memcg > charging was expensice and to provide the users to bypass the memcg > charging by letting them run in the root. However do we still want to > have this exception today? What is stopping us to start charging the > root memcg as well. Of course the root will not have limits but the > allocations will go through memcg charging and then the memory.current > of root and non-root will have the same semantics. > > This is an RFC to start a discussion on memcg charging for root. I vaguely remember when running some netperf tests (tcp_rr?) in a cgroup that the performance decreases considerably with every level down the hierarchy. I am assuming that charging was a part of the reason. If that's the case, charging the root will be similar to moving all workloads one level down the hierarchy in terms of charging overhead. > > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > --- > Documentation/admin-guide/cgroup-v2.rst | 6 ++++++ > mm/memcontrol.c | 5 +++++ > 2 files changed, 11 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 6c6075ed4aa5..e4afc05fd8ea 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1220,6 +1220,12 @@ PAGE_SIZE multiple when read back. > The total amount of memory currently being used by the cgroup > and its descendants. > > + memory.children_usage > + A read-only single value file which exists only on root cgroup. > + > + The total amount of memory currently being used by the > + descendants of the root cgroup. > + > memory.min > A read-write single value file which exists on non-root > cgroups. The default is "0". > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 960371788687..eba8cf76d3d3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -4304,6 +4304,11 @@ static struct cftype memory_files[] = { > .flags = CFTYPE_NOT_ON_ROOT, > .read_u64 = memory_current_read, > }, > + { > + .name = "children_usage", > + .flags = CFTYPE_ONLY_ON_ROOT, > + .read_u64 = memory_current_read, > + }, > { > .name = "peak", > .flags = CFTYPE_NOT_ON_ROOT, > -- > 2.43.0 > >