On Mon 05-12-22 18:34:05, Mina Almasry wrote: > commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg > reclaim"") enabled demotion in memcg reclaim, which is the right thing > to do, however, it introduced a regression in the behavior of > try_to_free_mem_cgroup_pages(). > > The callers of try_to_free_mem_cgroup_pages() expect it to attempt to > reclaim - not demote - nr_pages from the cgroup. I.e. the memory usage > of the cgroup should reduce by nr_pages. The callers expect > try_to_free_mem_cgroup_pages() to also return the number of pages > reclaimed, not demoted. > > However, what try_to_free_mem_cgroup_pages() actually does is it > unconditionally counts demoted pages as reclaimed pages. So in practice > when it is called it will often demote nr_pages and return the number of > demoted pages to the caller. Demoted pages don't lower the memcg usage, > and so try_to_free_mem_cgroup_pages() is not actually doing what the > callers want it to do. > > Various things work suboptimally on memory tiered systems or don't work > at all due to this: > > - memory.high enforcement likely doesn't work (it just demotes nr_pages > instead of lowering the memcg usage by nr_pages). > - try_charge_memcg() will keep retrying the charge while > try_to_free_mem_cgroup_pages() is just demoting pages and not actually > making any room for the charge. This has been brought up during the review https://lore.kernel.org/all/YoYTEDD+c4GT0xYY@xxxxxxxxxxxxxx/ > - memory.reclaim has a wonky interface. It advertises to the user it > reclaims the provided amount but it will actually often demote that > amount. > > There may be more effects to this issue. > > To fix these issues I propose shrink_folio_list() to only count pages > demoted from inside of sc->nodemask to outside of sc->nodemask as > 'reclaimed'. Could you expand on why the node mask matters? From the charge point of view it should be completely uninteresting as the charge remains. I suspect we really need to change to reclaim metrics for memcg reclaim. In the memory balancing reclaim we can indeed consider demotions as a reclaim because the memory is freed in the end but for the memcg reclaim we really should be counting discharges instead. No demotion/migration will free up charges. -- Michal Hocko SUSE Labs