On Wed, Jan 13, 2016 at 02:49:16PM -0800, Andrew Morton wrote: > It would be nice to see example output, and a description of why this > output was chosen: what was included, what was omitted, why it was > presented this way, what units were chosen for displaying the stats and > why. Will the things which are being displayed still be relevant (or > even available) 10 years from now. etcetera. > > And the interface should be documented at some point. Doing it now > will help with the review of the proposed interface. > > Because this stuff is forever and we have to get it right. Here is a follow-up to 1/2 that hopefully addresses all that, as well as the 32-bit overflow problem. What do you think? I'm probably a bit too optimistic with being able to maintain a meaningful sort order of the file when adding new entries. It depends on whether people start relying on items staying at fixed offsets and what we tell them in response when that breaks. I hope that we can at least get the main memory consumers in before this is released, just in case. >From 1be87db16a3895538ce65362b5234ef9c8af308d Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@xxxxxxxxxxx> Date: Thu, 14 Jan 2016 10:40:24 -0500 Subject: [PATCH] mm: memcontrol: basic memory statistics in cgroup2 memory controller fix Fixlet addressing akpm's feedback: - Fix overflowing byte counters on 32-bit. Just like in the existing interface files, bytes must be printed as u64 to work with highmem. - Add documentation in cgroup.txt that explains the memory.stat file and its format. - Rethink item ordering to accomodate potential future additions. The ordering now follows both 1) from big picture to detail and 2) from stats that reflect on userspace behavior towards stats that reflect on kernel heuristics. Both are gradients, and item-by-item ordering will still require judgement calls (and some bike shed painting). Changelog addendum to the original patch: The output of this file looks as follows: $ cat memory.stat anon 167936 file 87302144 file_mapped 0 file_dirty 0 file_writeback 0 inactive_anon 0 active_anon 155648 inactive_file 87298048 active_file 4096 unevictable 0 pgfault 636 pgmajfault 0 The list consists of two sections: statistics reflecting the current state of the memory management subsystem, and statistics reflecting past events. The items themselves are sorted such that generic big picture items come before specific details, and items related to userspace activity come before items related to kernel heuristics. All memory counters are in bytes to eliminate all ambiguity with variable page sizes. There will be more items and statistics added in the future, but this is a good initial set to get a minimum of insight into how a cgroup is using memory, and the items chosen for now are likely to remain valid even with significant changes to the memory management implementation. Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> --- Documentation/cgroup.txt | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 45 +++++++++++++++++++++++--------------- 2 files changed, 84 insertions(+), 17 deletions(-) diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt index f441564..65b3eac 100644 --- a/Documentation/cgroup.txt +++ b/Documentation/cgroup.txt @@ -819,6 +819,62 @@ PAGE_SIZE multiple when read back. the cgroup. This may not exactly match the number of processes killed but should generally be close. + memory.stat + + A read-only flat-keyed file which exists on non-root cgroups. + + This breaks down the cgroup's memory footprint into different + types of memory, type-specific details, and other information + on the state and past events of the memory management system. + + All memory amounts are in bytes. + + The entries are ordered to be human readable, and new entries + can show up in the middle. Don't rely on items remaining in a + fixed position; use the keys to look up specific values! + + anon + + Amount of memory used in anonymous mappings such as + brk(), sbrk(), and mmap(MAP_ANONYMOUS) + + file + + Amount of memory used to cache filesystem data, + including tmpfs and shared memory. + + file_mapped + + Amount of cached filesystem data mapped with mmap() + + file_dirty + + Amount of cached filesystem data that was modified but + not yet written back to disk + + file_writeback + + Amount of cached filesystem data that was modified and + is currently being written back to disk + + inactive_anon + active_anon + inactive_file + active_file + unevictable + + Amount of memory, swap-backed and filesystem-backed, + on the internal memory management lists used by the + page reclaim algorithm + + pgfault + + Total number of page faults incurred + + pgmajfault + + Number of major page faults incurred + memory.swap.current A read-only single value file which exists on non-root diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8645852..cdb51a9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5112,32 +5112,43 @@ static int memory_stat_show(struct seq_file *m, void *v) struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); int i; - /* Memory consumer totals */ - - seq_printf(m, "anon %lu\n", - tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE); - seq_printf(m, "file %lu\n", - tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE); + /* + * Provide statistics on the state of the memory subsystem as + * well as cumulative event counters that show past behavior. + * + * This list is ordered following a combination of these gradients: + * 1) generic big picture -> specifics and details + * 2) reflecting userspace activity -> reflecting kernel heuristics + * + * Current memory state: + */ - /* Per-consumer breakdowns */ + seq_printf(m, "anon %llu\n", + (u64)tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE); + seq_printf(m, "file %llu\n", + (u64)tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE); + + seq_printf(m, "file_mapped %llu\n", + (u64)tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * + PAGE_SIZE); + seq_printf(m, "file_dirty %llu\n", + (u64)tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * + PAGE_SIZE); + seq_printf(m, "file_writeback %llu\n", + (u64)tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * + PAGE_SIZE); for (i = 0; i < NR_LRU_LISTS; i++) { struct mem_cgroup *mi; unsigned long val = 0; for_each_mem_cgroup_tree(mi, memcg) - val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE; - seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val); + val += mem_cgroup_nr_lru_pages(mi, BIT(i)); + seq_printf(m, "%s %llu\n", + mem_cgroup_lru_names[i], (u64)val * PAGE_SIZE); } - seq_printf(m, "file_mapped %lu\n", - tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE); - seq_printf(m, "file_dirty %lu\n", - tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE); - seq_printf(m, "file_writeback %lu\n", - tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE); - - /* Memory management events */ + /* Accumulated memory events */ seq_printf(m, "pgfault %lu\n", tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT)); -- 2.7.0 -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html