During page/folio reclaim, we check if a folio is referenced using folio_referenced() to avoid reclaiming folios that have been recently accessed (hot memory). The rationale is that this memory is likely to be accessed soon, and hence reclaiming it will cause a refault. For memcg reclaim, we currently only check accesses to the folio from processes in the subtree of the target memcg. This behavior was originally introduced by commit bed7161a519a ("Memory controller: make page_referenced() cgroup aware") a long time ago. Back then, refaulted pages would get charged to the memcg of the process that was faulting them in. It made sense to only consider accesses coming from processes in the subtree of target_mem_cgroup. If a page was charged to memcg A but only being accessed by a sibling memcg B, we would reclaim it if memcg A is is the reclaim target. memcg B can then fault it back in and get charged for it appropriately. Today, this behavior still makes sense for file pages. However, unlike file pages, when swapbacked pages are refaulted they are charged to the memcg that was originally charged for them during swapping out. Which means that if a swapbacked page is charged to memcg A but only used by memcg B, and we reclaim it from memcg A, it would simply be faulted back in and charged again to memcg A once memcg B accesses it. In that sense, accesses from all memcgs matter equally when considering if a swapbacked page/folio is a viable reclaim target. Modify folio_referenced() to always consider accesses from all memcgs if the folio is swapbacked. Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> --- v1 -> v2: - Move the folio_test_swapbacked() check inside folio_referenced() (Johannes). - Slight rephrasing of the commit log and comment to make them clearer. - Renamed memcg argument to folio_referenced() to target_memcg. --- include/linux/rmap.h | 2 +- mm/rmap.c | 35 +++++++++++++++++++++++++++-------- 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index ca3e4ba6c58c..8130586eb637 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -352,7 +352,7 @@ static inline int page_try_share_anon_rmap(struct page *page) * Called from mm/vmscan.c to handle paging out */ int folio_referenced(struct folio *, int is_locked, - struct mem_cgroup *memcg, unsigned long *vm_flags); + struct mem_cgroup *target_memcg, unsigned long *vm_flags); void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); diff --git a/mm/rmap.c b/mm/rmap.c index b6743c2b8b5f..7c98d0ca7cc6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -882,7 +882,7 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg) * folio_referenced() - Test if the folio was referenced. * @folio: The folio to test. * @is_locked: Caller holds lock on the folio. - * @memcg: target memory cgroup + * @target_memcg: target memory cgroup of reclaim. * @vm_flags: A combination of all the vma->vm_flags which referenced the folio. * * Quick test_and_clear_referenced for all mappings of a folio, @@ -891,12 +891,12 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg) * the function bailed out due to rmap lock contention. */ int folio_referenced(struct folio *folio, int is_locked, - struct mem_cgroup *memcg, unsigned long *vm_flags) + struct mem_cgroup *target_memcg, unsigned long *vm_flags) { int we_locked = 0; struct folio_referenced_arg pra = { .mapcount = folio_mapcount(folio), - .memcg = memcg, + .memcg = target_memcg, }; struct rmap_walk_control rwc = { .rmap_one = folio_referenced_one, @@ -919,13 +919,32 @@ int folio_referenced(struct folio *folio, int is_locked, } /* - * If we are reclaiming on behalf of a cgroup, skip - * counting on behalf of references from different - * cgroups + * We check references to folios to make sure we don't reclaim hot + * folios that are likely to be refaulted soon. If we are reclaiming + * memory on behalf of a memcg, we may want to skip references from + * processes outside the target memcg's subtree. + * + * For file folios, we only consider references from processes in the + * subtree of the target memcg. If memcg A is under reclaim, and a + * folio is charged to memcg A but only referenced by processes in + * memcg B, we ignore references from memcg B and try to reclaim it. + * If it is later accessed by memcg B it will be faulted back in and + * charged appropriately to memcg B. For memcg A, this is cold memory + * that should be reclaimed. + * + * On the other hand, when swapbacked folios are faulted in, they get + * charged to the memcg that was originally charged for them at the time + * of swapping out. This means that if a folio that is charged to + * memcg A gets swapped out, it will get charged back to A when any + * process in any memcg accesses it. In that sense, we need to consider + * references from all memcgs when considering whether to reclaim a + * swapbacked folio. + * + * Hence, only skip references from outside the target memcg (if any) if + * the folio is not swapbacked. */ - if (memcg) { + if (target_memcg && !folio_test_swapbacked(folio)) rwc.invalid_vma = invalid_folio_referenced_vma; - } rmap_walk(folio, &rwc); *vm_flags = pra.vm_flags; -- 2.38.0.rc1.362.ged0d419d3c-goog