On Tue, Jan 23, 2024 at 1:33 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Sun 21-01-24 21:44:12, T.J. Mercier wrote: > > This reverts commit 0388536ac29104a478c79b3869541524caec28eb. > > > > Proactive reclaim on the root cgroup is 10x slower after this patch when > > MGLRU is enabled, and completion times for proactive reclaim on much > > smaller non-root cgroups take ~30% longer (with or without MGLRU). > > What is the reclaim target in these pro-active reclaim requests? Two targets: 1) /sys/fs/cgroup/memory.reclaim 2) /sys/fs/cgroup/uid_0/memory.reclaim (a bunch of Android system services) Note that lru_gen_shrink_node is used for 1, but shrink_node_memcgs is used for 2. The 10x comes from the rate of reclaim (~70k pages/sec vs ~6.6k pages/sec) for 1. After this revert the root reclaim took only about 10 seconds. Before the revert it's still running after about 3 minutes using a core at 100% the whole time, and I'm too impatient to wait longer to record times for comparison. The 30% comes from the average of a few runs for 2: Before revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "" > /sys/fs/cgroup/uid_0/memory.reclaim' restarting adbd as root 0m09.69s real 0m00.00s user 0m09.19s system After revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "" > /sys/fs/cgroup/uid_0/memory.reclaim' 0m07.31s real 0m00.00s user 0m06.44s system It's actually a bigger difference for smaller reclaim amounts: Before revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "3G" > /sys/fs/cgroup/uid_0/memory.reclaim' 0m12.04s real 0m00.00s user 0m11.48s system After revert: $ adb wait-for-device && sleep 120 && adb root && adb shell -t 'time echo "3G" > /sys/fs/cgroup/uid_0/memory.reclaim' 0m06.65s real 0m00.00s user 0m05.91s system > > With > > root reclaim before the patch, I observe average reclaim rates of > > ~70k pages/sec before try_to_free_mem_cgroup_pages starts to fail and > > the nr_retries counter starts to decrement, eventually ending the > > proactive reclaim attempt. > > Do I understand correctly that the reclaim target is over estimated and > you expect that the reclaim process breaks out early Yes. I expect memory_reclaim to fail at some point when it becomes difficult/impossible to reclaim pages where I specify a large amount to reclaim. The ask here is, "please reclaim as much as possible from this cgroup, but don't take all day". But it takes minutes to get there on the root cgroup, working SWAP_CLUSTER_MAX pages at a time. > > After the patch the reclaim rate is > > consistently ~6.6k pages/sec due to the reduced nr_pages value causing > > scan aborts as soon as SWAP_CLUSTER_MAX pages are reclaimed. The > > proactive reclaim doesn't complete after several minutes because > > try_to_free_mem_cgroup_pages is still capable of reclaiming pages in > > tiny SWAP_CLUSTER_MAX page chunks and nr_retries is never decremented. > > I do not understand this part. How does a smaller reclaim target manages > to have reclaimed > 0 while larger one doesn't? They both are able to make progress. The main difference is that a single iteration of try_to_free_mem_cgroup_pages with MGLRU ends soon after it reclaims nr_to_reclaim, and before it touches all memcgs. So a single iteration really will reclaim only about SWAP_CLUSTER_MAX-ish pages with MGLRU. WIthout MGLRU the memcg walk is not aborted immediately after nr_to_reclaim is reached, so a single call to try_to_free_mem_cgroup_pages can actually reclaim thousands of pages even when sc->nr_to_reclaim is 32. (I.E. MGLRU overreclaims less.) https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@xxxxxxxxxx/