On Wed, Jul 17, 2019 at 03:29:19PM +0300, Konstantin Khlebnikov wrote: > This is alternative solution for problem addressed in commit 815744d75152 > ("mm: memcontrol: don't batch updates of local VM stats and events"). > > Instead of adding second set of percpu counters which wastes memory and > slows down showing statistics in cgroup-v1 this patch use two arrays of > atomic counters: local and nested statistics. > > Then update has the same amount of atomic operations: local update and > one nested for each parent cgroup. Readers of hierarchical statistics > have to sum two atomics which isn't a big deal. > > All updates are still batched using one set of percpu counters. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> Yeah that looks better. Note that it was never about the atomics, though, but rather the number of cachelines dirtied. Your patch should solve this problem as well, but it might be a good idea to run will-it-scale on it to make sure the struct layout is still fine.