On Tue 23-01-24 05:58:05, T.J. Mercier wrote: > On Tue, Jan 23, 2024 at 1:33 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Sun 21-01-24 21:44:12, T.J. Mercier wrote: > > > This reverts commit 0388536ac29104a478c79b3869541524caec28eb. > > > > > > Proactive reclaim on the root cgroup is 10x slower after this patch when > > > MGLRU is enabled, and completion times for proactive reclaim on much > > > smaller non-root cgroups take ~30% longer (with or without MGLRU). > > > > What is the reclaim target in these pro-active reclaim requests? > > Two targets: > 1) /sys/fs/cgroup/memory.reclaim > 2) /sys/fs/cgroup/uid_0/memory.reclaim (a bunch of Android system services) OK, I was not really clear. I was curious about nr_to_reclaim. > Note that lru_gen_shrink_node is used for 1, but shrink_node_memcgs is > used for 2. > > The 10x comes from the rate of reclaim (~70k pages/sec vs ~6.6k > pages/sec) for 1. After this revert the root reclaim took only about > 10 seconds. Before the revert it's still running after about 3 minutes > using a core at 100% the whole time, and I'm too impatient to wait > longer to record times for comparison. > > The 30% comes from the average of a few runs for 2: > Before revert: > $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time > echo "" > /sys/fs/cgroup/uid_0/memory.reclaim' Ohh, so you want to reclaim all of it (resp. as much as possible). [...] > > > After the patch the reclaim rate is > > > consistently ~6.6k pages/sec due to the reduced nr_pages value causing > > > scan aborts as soon as SWAP_CLUSTER_MAX pages are reclaimed. The > > > proactive reclaim doesn't complete after several minutes because > > > try_to_free_mem_cgroup_pages is still capable of reclaiming pages in > > > tiny SWAP_CLUSTER_MAX page chunks and nr_retries is never decremented. > > > > I do not understand this part. How does a smaller reclaim target manages > > to have reclaimed > 0 while larger one doesn't? > > They both are able to make progress. The main difference is that a > single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon > after it reclaims nr_to_reclaim, and before it touches all memcgs. So > a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish > pages with MGLRU. WIthout MGLRU the memcg walk is not aborted > immediately after nr_to_reclaim is reached, so a single call to > try_to_free_mem_cgroup_pages can actually reclaim thousands of pages > even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@xxxxxxxxxx/ OK, I do see how try_to_free_mem_cgroup_pages might over reclaim but I do not really follow how increasing the batch actually fixes the issue that there is always progress being made and therefore memory_reclaim takes ages to terminates? -- Michal Hocko SUSE Labs