On Mon 12-08-19 16:37:54, Roman Gushchin wrote: > Similar to vmstats, percpu caching of local vmevents leads to an > accumulation of errors on non-leaf levels. This happens because > some leftovers may remain in percpu caches, so that they are > never propagated up by the cgroup tree and just disappear into > nonexistence with on releasing of the memory cgroup. > > To fix this issue let's accumulate and propagate percpu vmevents > values before releasing the memory cgroup similar to what we're > doing with vmstats. > > Since on cpu hotplug we do flush percpu vmstats anyway, we can > iterate only over online cpus. > > Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") > Signed-off-by: Roman Gushchin <guro@xxxxxx> > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Acked-by: Michal Hocko <mhocko@xxxxxxxx> > --- > mm/memcontrol.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6d2427abcc0c..249187907339 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -3459,6 +3459,25 @@ static void memcg_flush_percpu_vmstats(struct mem_cgroup *memcg, bool slab_only) > } > } > > +static void memcg_flush_percpu_vmevents(struct mem_cgroup *memcg) > +{ > + unsigned long events[NR_VM_EVENT_ITEMS]; > + struct mem_cgroup *mi; > + int cpu, i; > + > + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) > + events[i] = 0; > + > + for_each_online_cpu(cpu) > + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) > + events[i] += raw_cpu_read( > + memcg->vmstats_percpu->events[i]); > + > + for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) > + for (i = 0; i < NR_VM_EVENT_ITEMS; i++) > + atomic_long_add(events[i], &mi->vmevents[i]); > +} > + > static void memcg_offline_kmem(struct mem_cgroup *memcg) > { > struct cgroup_subsys_state *css; > @@ -4860,10 +4879,11 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg) > int node; > > /* > - * Flush percpu vmstats to guarantee the value correctness > + * Flush percpu vmstats and vmevents to guarantee the value correctness > * on parent's and all ancestor levels. > */ > memcg_flush_percpu_vmstats(memcg, false); > + memcg_flush_percpu_vmevents(memcg); > for_each_node(node) > free_mem_cgroup_per_node_info(memcg, node); > free_percpu(memcg->vmstats_percpu); > -- > 2.21.0 -- Michal Hocko SUSE Labs