On Mon, Aug 14, 2023 at 9:16 AM T.J. Mercier <tjmercier@xxxxxxxxxx> wrote: > > When a memcg is in the process of being released mem_cgroup_tryget will > fail because its reference count has already reached 0. This can happen > during reclaim if the memcg has already been offlined, and we reclaim > all remaining pages attributed to the offlined memcg. shrink_many > attempts to skip the empty memcg in this case, and continue reclaiming > from the remaining memcgs in the old generation. If there is only one > memcg remaining, or if all remaining memcgs are in the process of being > released then shrink_many will spin until all memcgs have finished > being released. The release occurs through a workqueue, so it can take > a while before kswapd is able to make any further progress. > > This fix results in reductions in kswapd activity and direct reclaim in > a test where 28 apps (working set size > total memory) are repeatedly > launched in a random sequence: > > A B delta ratio(%) > allocstall_movable 5962 3539 -2423 -40.64 > allocstall_normal 2661 2417 -244 -9.17 > kswapd_high_wmark_hit_quickly 53152 7594 -45558 -85.71 > pageoutrun 57365 11750 -45615 -79.52 > > Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: T.J. Mercier <tjmercier@xxxxxxxxxx> Acked-by: Yu Zhao <yuzhao@xxxxxxxxxx>