On Tue, Aug 13, 2019 at 02:27:52PM -0700, Andrew Morton wrote: > On Mon, 12 Aug 2019 15:29:10 -0700 Roman Gushchin <guro@xxxxxx> wrote: > > > Percpu caching of local vmstats with the conditional propagation > > by the cgroup tree leads to an accumulation of errors on non-leaf > > levels. > > > > Let's imagine two nested memory cgroups A and A/B. Say, a process > > belonging to A/B allocates 100 pagecache pages on the CPU 0. > > The percpu cache will spill 3 times, so that 32*3=96 pages will be > > accounted to A/B and A atomic vmstat counters, 4 pages will remain > > in the percpu cache. > > > > Imagine A/B is nearby memory.max, so that every following allocation > > triggers a direct reclaim on the local CPU. Say, each such attempt > > will free 16 pages on a new cpu. That means every percpu cache will > > have -16 pages, except the first one, which will have 4 - 16 = -12. > > A/B and A atomic counters will not be touched at all. > > > > Now a user removes A/B. All percpu caches are freed and corresponding > > vmstat numbers are forgotten. A has 96 pages more than expected. > > > > As memory cgroups are created and destroyed, errors do accumulate. > > Even 1-2 pages differences can accumulate into large numbers. > > > > To fix this issue let's accumulate and propagate percpu vmstat > > values before releasing the memory cgroup. At this point these > > numbers are stable and cannot be changed. > > > > Since on cpu hotplug we do flush percpu vmstats anyway, we can > > iterate only over online cpus. > > > > Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") > > Is this not serious enough for a cc:stable? I hope the "Fixes" tag will work, but yeah, my bad, cc:stable is definitely a good idea here. Added stable@ to cc. Thanks!