On Tue, Sep 15, 2020 at 3:07 AM Randy Dunlap <rdunlap@xxxxxxxxxxxxx> wrote: > > On 9/13/20 12:00 AM, Muchun Song wrote: > > In the cgroup v1, we have a numa_stat interface. This is useful for > > providing visibility into the numa locality information within an > > memcg since the pages are allowed to be allocated from any physical > > node. One of the use cases is evaluating application performance by > > combining this information with the application's CPU allocation. > > But the cgroup v2 does not. So this patch adds the missing information. > > > > Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > > Suggested-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > Reported-by: kernel test robot <lkp@xxxxxxxxx> > > --- > > changelog in v3: > > 1. Fix compiler error on powerpc architecture reported by kernel test robot. > > 2. Fix a typo from "anno" to "anon". > > > > changelog in v2: > > 1. Add memory.numa_stat interface in cgroup v2. > > > > Documentation/admin-guide/cgroup-v2.rst | 72 ++++++++++++++++ > > mm/memcontrol.c | 107 ++++++++++++++++++++++++ > > 2 files changed, 179 insertions(+) > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > index 6be43781ec7f..92207f0012e4 100644 > > --- a/Documentation/admin-guide/cgroup-v2.rst > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > @@ -1368,6 +1368,78 @@ PAGE_SIZE multiple when read back. > > collapsing an existing range of pages. This counter is not > > present when CONFIG_TRANSPARENT_HUGEPAGE is not set. > > > > + memory.numa_stat > > + A read-only flat-keyed file which exists on non-root cgroups. > > + > > + This breaks down the cgroup's memory footprint into different > > + types of memory, type-specific details, and other information > > + per node on the state of the memory management system. > > + > > + This is useful for providing visibility into the numa locality > > capitalize acronyms, please: NUMA OK, I will do that. Thanks. > > > > + information within an memcg since the pages are allowed to be > > + allocated from any physical node. One of the use cases is evaluating > > + application performance by combining this information with the > > + application's CPU allocation. > > + > > + All memory amounts are in bytes. > > + > > + The output format of memory.numa_stat is:: > > + > > + type N0=<node 0 pages> N1=<node 1 pages> ... > > Now I'm confused. 5 lines above here it says "All memory amounts are in bytes" > but these appear to be in pages. Which is it? and what size pages if that matters? Sorry. It's my mistake. I will fix it. > > Is it like this? > type N0=<bytes in node 0 pages> N1=<bytes in node 1 pages> ... Thanks. > > > > > + The entries are ordered to be human readable, and new entries > > + can show up in the middle. Don't rely on items remaining in a > > + fixed position; use the keys to look up specific values! > > + > > + anon > > + Amount of memory per node used in anonymous mappings such > > + as brk(), sbrk(), and mmap(MAP_ANONYMOUS) > > + > > + file > > + Amount of memory per node used to cache filesystem data, > > + including tmpfs and shared memory. > > + > > + kernel_stack > > + Amount of memory per node allocated to kernel stacks. > > + > > + shmem > > + Amount of cached filesystem data per node that is swap-backed, > > + such as tmpfs, shm segments, shared anonymous mmap()s > > + > > + file_mapped > > + Amount of cached filesystem data per node mapped with mmap() > > + > > + file_dirty > > + Amount of cached filesystem data per node that was modified but > > + not yet written back to disk > > + > > + file_writeback > > + Amount of cached filesystem data per node that was modified and > > + is currently being written back to disk > > + > > + anon_thp > > + Amount of memory per node used in anonymous mappings backed by > > + transparent hugepages > > + > > + inactive_anon, active_anon, inactive_file, active_file, unevictable > > + Amount of memory, swap-backed and filesystem-backed, > > + per node on the internal memory management lists used > > + by the page reclaim algorithm. > > + > > + As these represent internal list state (eg. shmem pages are on anon > > e.g. Thanks. > > > + memory management lists), inactive_foo + active_foo may not be equal to > > + the value for the foo counter, since the foo counter is type-based, not > > + list-based. > > + > > + slab_reclaimable > > + Amount of memory per node used for storing in-kernel data > > + structures which might be reclaimed, such as dentries and > > + inodes. > > + > > + slab_unreclaimable > > + Amount of memory per node used for storing in-kernel data > > + structures which cannot be reclaimed on memory pressure. > > Some of the descriptions above end with a '.' and some do not. Please be consistent. Will do that. > > > + > > memory.swap.current > > A read-only single value file which exists on non-root > > cgroups. > > > thanks. > -- > ~Randy > -- Yours, Muchun