On Fri, Apr 26, 2024 at 05:37:29PM -0700, Shakeel Butt wrote: > At the moment, the amount of memory allocated for stats related structs > in the mem_cgroup corresponds to the size of enum node_stat_item. > However not all fields in enum node_stat_item has corresponding memcg > stats. So, let's use indirection mechanism similar to the one used for > memcg vmstats management. > > For a given x86_64 config, the size of stats with and without patch is: > > structs size in bytes w/o with > > struct lruvec_stats 1128 648 > struct lruvec_stats_percpu 752 432 > struct memcg_vmstats 1832 1352 > struct memcg_vmstats_percpu 1280 960 > > The memory savings is further compounded by the fact that these structs > are allocated for each cpu and for each node. To be precise, for each > memcg the memory saved would be: > > Memory saved = ((21 * 3 * NR_NODES) + (21 * 2 * NR_NODS * NR_CPUS) + > (21 * 3) + (21 * 2 * NR_CPUS)) * sizeof(long) > > Where 21 is the number of fields eliminated. Nice savings! > > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > --- > mm/memcontrol.c | 138 ++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 115 insertions(+), 23 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 5e337ed6c6bf..c164bc9b8ed6 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -576,35 +576,105 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) > return mz; > } > > +/* Subset of node_stat_item for memcg stats */ > +static const unsigned int memcg_node_stat_items[] = { > + NR_INACTIVE_ANON, > + NR_ACTIVE_ANON, > + NR_INACTIVE_FILE, > + NR_ACTIVE_FILE, > + NR_UNEVICTABLE, > + NR_SLAB_RECLAIMABLE_B, > + NR_SLAB_UNRECLAIMABLE_B, > + WORKINGSET_REFAULT_ANON, > + WORKINGSET_REFAULT_FILE, > + WORKINGSET_ACTIVATE_ANON, > + WORKINGSET_ACTIVATE_FILE, > + WORKINGSET_RESTORE_ANON, > + WORKINGSET_RESTORE_FILE, > + WORKINGSET_NODERECLAIM, > + NR_ANON_MAPPED, > + NR_FILE_MAPPED, > + NR_FILE_PAGES, > + NR_FILE_DIRTY, > + NR_WRITEBACK, > + NR_SHMEM, > + NR_SHMEM_THPS, > + NR_FILE_THPS, > + NR_ANON_THPS, > + NR_KERNEL_STACK_KB, > + NR_PAGETABLE, > + NR_SECONDARY_PAGETABLE, > +#ifdef CONFIG_SWAP > + NR_SWAPCACHE, > +#endif > +}; > + > +static const unsigned int memcg_stat_items[] = { > + MEMCG_SWAP, > + MEMCG_SOCK, > + MEMCG_PERCPU_B, > + MEMCG_VMALLOC, > + MEMCG_KMEM, > + MEMCG_ZSWAP_B, > + MEMCG_ZSWAPPED, > +}; > + > +#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) > +#define NR_MEMCG_STATS (NR_MEMCG_NODE_STAT_ITEMS + ARRAY_SIZE(memcg_stat_items)) > +static int8_t mem_cgroup_stats_index[MEMCG_NR_STAT] __read_mostly; > + > +static void init_memcg_stats(void) > +{ > + int8_t i, j = 0; > + > + /* Switch to short once this failure occurs. */ > + BUILD_BUG_ON(NR_MEMCG_STATS >= 127 /* INT8_MAX */); > + > + for (i = 0; i < NR_MEMCG_NODE_STAT_ITEMS; ++i) > + mem_cgroup_stats_index[memcg_node_stat_items[i]] = ++j; > + > + for (i = 0; i < ARRAY_SIZE(memcg_stat_items); ++i) > + mem_cgroup_stats_index[memcg_stat_items[i]] = ++j; > +} > + > +static inline int memcg_stats_index(int idx) > +{ > + return mem_cgroup_stats_index[idx] - 1; > +} Hm, I'm slightly worried about the performance penalty due to the increased cache footprint. Can't we have some formula to translate idx to memcg_idx instead of a translation table? If it requires a re-arrangement of items we can add a translation table on the read side to save the visible order in procfs/sysfs. Or I'm overthinking and the real difference is negligible? Thanks!