Re: [PATCH 1/2] mm: memcontrol: flush percpu vmstats before releasing memcg

Roman Gushchin <guro@xxxxxx> · Tue, 13 Aug 2019 21:46:47 +0000

On Tue, Aug 13, 2019 at 02:27:52PM -0700, Andrew Morton wrote:
> On Mon, 12 Aug 2019 15:29:10 -0700 Roman Gushchin <guro@xxxxxx> wrote:
> 
> > Percpu caching of local vmstats with the conditional propagation
> > by the cgroup tree leads to an accumulation of errors on non-leaf
> > levels.
> > 
> > Let's imagine two nested memory cgroups A and A/B. Say, a process
> > belonging to A/B allocates 100 pagecache pages on the CPU 0.
> > The percpu cache will spill 3 times, so that 32*3=96 pages will be
> > accounted to A/B and A atomic vmstat counters, 4 pages will remain
> > in the percpu cache.
> > 
> > Imagine A/B is nearby memory.max, so that every following allocation
> > triggers a direct reclaim on the local CPU. Say, each such attempt
> > will free 16 pages on a new cpu. That means every percpu cache will
> > have -16 pages, except the first one, which will have 4 - 16 = -12.
> > A/B and A atomic counters will not be touched at all.
> > 
> > Now a user removes A/B. All percpu caches are freed and corresponding
> > vmstat numbers are forgotten. A has 96 pages more than expected.
> > 
> > As memory cgroups are created and destroyed, errors do accumulate.
> > Even 1-2 pages differences can accumulate into large numbers.
> > 
> > To fix this issue let's accumulate and propagate percpu vmstat
> > values before releasing the memory cgroup. At this point these
> > numbers are stable and cannot be changed.
> > 
> > Since on cpu hotplug we do flush percpu vmstats anyway, we can
> > iterate only over online cpus.
> > 
> > Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
> 
> Is this not serious enough for a cc:stable?

I hope the "Fixes" tag will work, but yeah, my bad, cc:stable is definitely
a good idea here.

Added stable@ to cc.

Thanks!