On Mon 12-08-19 15:29:11, Roman Gushchin wrote: > I've noticed that the "slab" value in memory.stat is sometimes 0, > even if some children memory cgroups have a non-zero "slab" value. > The following investigation showed that this is the result > of the kmem_cache reparenting in combination with the per-cpu > batching of slab vmstats. > > At the offlining some vmstat value may leave in the percpu cache, > not being propagated upwards by the cgroup hierarchy. It means > that stats on ancestor levels are lower than actual. Later when > slab pages are released, the precise number of pages is substracted > on the parent level, making the value negative. We don't show negative > values, 0 is printed instead. So the difference with other counters is that slab ones are reparented and that's why we have treat them specially? I guess that is what the comment in the code suggest but being explicit in the changelog would be nice. [...] > -static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg) > +static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg, bool slab_only) > { > unsigned long stat[MEMCG_NR_STAT]; > struct mem_cgroup *mi; > int node, cpu, i; > + int min_idx, max_idx; > > - for (i = 0; i < MEMCG_NR_STAT; i++) > + if (slab_only) { > + min_idx = NR_SLAB_RECLAIMABLE; > + max_idx = NR_SLAB_UNRECLAIMABLE; > + } else { > + min_idx = 0; > + max_idx = MEMCG_NR_STAT; > + } This is just ugly has hell! I really detest how this implicitly makes counters value very special without any note in the node_stat_item definition. Is it such a big deal to have a per counter flush and do the loop over all counters resp. specific counters around it so much worse? This should be really a slow path to safe few instructions or cache misses, no? -- Michal Hocko SUSE Labs