On Thu, Jan 25, 2024 at 6:17 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > The patch titled > Subject: mm: memcg: optimize parent iteration in memcg_rstat_updated() > has been added to the -mm mm-hotfixes-unstable branch. Its filename is > mm-memcg-optimize-parent-iteration-in-memcg_rstat_updated.patch > > This patch will shortly appear at > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-memcg-optimize-parent-iteration-in-memcg_rstat_updated.patch > > This patch will later appear in the mm-hotfixes-unstable branch at > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next via the mm-everything > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > and is updated there every 2-3 working days > > ------------------------------------------------------ > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Subject: mm: memcg: optimize parent iteration in memcg_rstat_updated() > Date: Wed, 24 Jan 2024 10:00:22 +0000 > > In memcg_rstat_updated(), we iterate the memcg being updated and its > parents to update memcg->vmstats_percpu->stats_updates in the fast path > (i.e. no atomic updates). According to my math, this is 3 memory loads > (and potentially 3 cache misses) per memcg: > - Load the address of memcg->vmstats_percpu. > - Load vmstats_percpu->stats_updates (based on some percpu calculation). > - Load the address of the parent memcg. > > Avoid most of the cache misses by caching a pointer from each struct > memcg_vmstats_percpu to its parent on the corresponding CPU. In this > case, for the first memcg we have 2 memory loads (same as above): > - Load the address of memcg->vmstats_percpu. > - Load vmstats_percpu->stats_updates (based on some percpu calculation). > > Then for each additional memcg, we need a single load to get the > parent's stats_updates directly. This reduces the number of loads from > O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate. Hey Andrew, Do you mind correcting this in place: s/O(2+N)/O(1+N)/g? The first memcg requires 2 loads, and other memcgs require 1 load, so that's 1+N not 2+N.