On Mon, 12 Aug 2019 15:29:10 -0700 Roman Gushchin <guro@xxxxxx> wrote: > Percpu caching of local vmstats with the conditional propagation > by the cgroup tree leads to an accumulation of errors on non-leaf > levels. > > Let's imagine two nested memory cgroups A and A/B. Say, a process > belonging to A/B allocates 100 pagecache pages on the CPU 0. > The percpu cache will spill 3 times, so that 32*3=96 pages will be > accounted to A/B and A atomic vmstat counters, 4 pages will remain > in the percpu cache. > > Imagine A/B is nearby memory.max, so that every following allocation > triggers a direct reclaim on the local CPU. Say, each such attempt > will free 16 pages on a new cpu. That means every percpu cache will > have -16 pages, except the first one, which will have 4 - 16 = -12. > A/B and A atomic counters will not be touched at all. > > Now a user removes A/B. All percpu caches are freed and corresponding > vmstat numbers are forgotten. A has 96 pages more than expected. > > As memory cgroups are created and destroyed, errors do accumulate. > Even 1-2 pages differences can accumulate into large numbers. > > To fix this issue let's accumulate and propagate percpu vmstat > values before releasing the memory cgroup. At this point these > numbers are stable and cannot be changed. > > Since on cpu hotplug we do flush percpu vmstats anyway, we can > iterate only over online cpus. > > Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Is this not serious enough for a cc:stable?