On Tue, Jul 13, 2021 at 1:24 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > At the moment memcg stats are read in four contexts: > > 1. memcg stat user interfaces > 2. dirty throttling > 3. page fault > 4. memory reclaim > > Currently the kernel flushes the stats for first two cases. Flushing the > stats for remaining two casese may have performance impact. Always > flushing the memcg stats on the page fault code path may negatively > impacts the performance of the applications. In addition flushing in the > memory reclaim code path, though treated as slowpath, can become the > source of contention for the global lock taken for stat flushing because > when system or memcg is under memory pressure, many tasks may enter the > reclaim path. > > This patch uses following mechanisms to solve these challenges: > > 1. Periodically flush the stats from root memcg every 2 seconds. This > will time limit the out of sync stats. > > 2. Asynchronously flush the stats after fixed number of stat updates. > In the worst case the stat can be out of sync by O(nr_cpus * BATCH) for > 2 seconds. > > 3. For avoiding thundering herd to flush the stats particularly from the > memory reclaim context, introduce memcg local spinlock and let only one > flusher active at a time. This could have been done through > cgroup_rstat_lock lock but that lock is used by other subsystem and for > userspace reading memcg stats. So, it is better to keep flushers > introduced by this patch decoupled from cgroup_rstat_lock. > --- > Changes since v2: > - Changed the subject of the patch > - Added mechanism to bound errors to nr_cpus instead of nr_cgroups > - memcg local lock to let one active flusher > > Changes since v1: > - use system_unbound_wq for flushing the memcg stats > Forgot to add v3 in the subject for this patch.