在 2024/3/8 14:41, 李培锋 写道:
在 2024/3/8 12:56, Matthew Wilcox 写道:
On Fri, Mar 08, 2024 at 11:11:24AM +0800,lipeifeng@xxxxxxxx wrote:
Commit 6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path")
prevents the reclaim path from becoming stuck on the rmap lock. However,
it reinserts those folios at the head of the LRU during shrink_folio_list,
even if those folios are very cold.
This seems like a lot of new code. Did you consider something simpler
like this?
Also, this is Minchan's patch you're complaining about. Add him to the
cc.
+++ b/mm/vmscan.c
@@ -817,6 +817,7 @@ enum folio_references {
FOLIOREF_RECLAIM,
FOLIOREF_RECLAIM_CLEAN,
FOLIOREF_KEEP,
+ FOLIOREF_RESCAN,
FOLIOREF_ACTIVATE,
};
@@ -837,9 +838,9 @@ static enum folio_references folio_check_references(struct folio *folio,
if (vm_flags & VM_LOCKED)
return FOLIOREF_ACTIVATE;
- /* rmap lock contention: rotate */
+ /* rmap lock contention: keep at the tail */
if (referenced_ptes == -1)
- return FOLIOREF_KEEP;
+ return FOLIOREF_RESCAN;
if (referenced_ptes) {
/*
@@ -1164,6 +1165,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
case FOLIOREF_ACTIVATE:
goto activate_locked;
case FOLIOREF_KEEP:
+ case FOLIOREF_RESCAN:
stat->nr_ref_keep += nr_pages;
goto keep_locked;
case FOLIOREF_RECLAIM:
@@ -1446,7 +1448,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
keep_locked:
folio_unlock(folio);
keep:
- list_add(&folio->lru, &ret_folios);
+ if (references == FOLIOREF_RESCAN)
+ list_add(&folio->lru, &rescan_folios);
+ else
+ list_add(&folio->lru, &ret_folios);
VM_BUG_ON_FOLIO(folio_test_lru(folio) ||
folio_test_unevictable(folio), folio);
}
Actually, we have tested the implementation method you mentioned:
Putting back the contended-folios in the tail of LRU during
shrink_folio_list
and rescan it in next shrink_folio_list.
In some cases, we found the another serious problems that more and more
contended-folios were piled up at the tail of the LRU, which caused to
the
serious lowmem-situation, because none of folios isolated could be
reclaimed
since lock-contended during shrink_folio_list.
Let me provide more detail.
In fact, we have tested the implementation you mentioned:
if folio is found to be in rmap lock-contention during
shrink_folio_list, it would be put back to the end of LRU and rescanned
in the next shrink_fofolio_list.
During the testing, we found a serious problem:
In some shrink_folio_list,all isolated pages could not be reclaimed due
to rmap lock-contention, resulting in a serious memory reclam
inefficiency and insufficient memfree.
The specific reasons are as follows:
In the case of insufficient memory, if folios are put back to the tail
of LRU due to rmap lock-contention during shirnk_folio_list, they will
be isolated in shrink_inactive_list soon and attempted to be reclaimed
by the next shrink_folio_list.But these folios are still likely to fail
to reclaim due to rmap lock-contention in the short term and put back to
the tail of LRU again.
As the testing progressed, more and more folios with high probability of
rmap lock-contention were put back to the tail of the LRU during
shrink_inactive_list, ultimately resulting in no folios isolated could
be successfully reclaimed in shrink_folio_list.
The shrink_inactive_list procedure does the following:
shrink_inactive_list()
-> isolate_lru_folios():
isolate the 32 folios from the tail of LRU(some of which may have been
put back in LRU last shrink_folio_list since rmap lock-contention)
-> shrink_folio_list():
reclaime folios and putback rmap lock-contended folios to the tail of LRU
For example, assuming all folios which were put back in LRU due to rmap
lock-contention in last shrink_folio_list, can not be reclaimed
successfully because of rmap lock-contention in some case:
1st shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios
-> shrink_folio_list():reclaim 24 folios, putback 8 rmap lock-contended
folios
2nd shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 8 rmap lock-contended
folios
-> shrink_folio_list():reclaim 16 folios, putback 16 rmap lock-contended
folios
3rd shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 16 rmap
lock-contended folios
-> shrink_folio_list():reclaim 8 folios, putback 24 rmap lock-contended
folios
4th shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 24 rmap
lock-contended folios
-> shrink_folio_list():reclaim 0 folios, putback 32 rmap lock-contended
folios
5th shrink_inactive_list():
-> isolate_lru_folios():isolate 32 folios, include 32 rmap
lock-contended folios
-> shrink_folio_list():reclaim 0 folios, putback 32 rmap lock-contended
folios