Re: [PATCH 1/5] mm: Do not reclaim private data from pinned page

David Hildenbrand <david@xxxxxxxxxx> · Mon, 13 Feb 2023 10:01:35 +0100

On 09.02.23 13:31, Jan Kara wrote:
If the page is pinned, there's no point in trying to reclaim it.
Furthermore if the page is from the page cache we don't want to reclaim
fs-private data from the page because the pinning process may be writing
to the page at any time and reclaiming fs private info on a dirty page
can upset the filesystem (see link below).

Link: https://lore.kernel.org/linux-mm/20180103100430.GE4911@xxxxxxxxxxxxxx
Signed-off-by: Jan Kara <jack@xxxxxxx>
---
  mm/vmscan.c | 10 ++++++++++
  1 file changed, 10 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bf3eedf0209c..ab3911a8b116 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1901,6 +1901,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
  			}
  		}
  
+		/*
+		 * Folio is unmapped now so it cannot be newly pinned anymore.
+		 * No point in trying to reclaim folio if it is pinned.
+		 * Furthermore we don't want to reclaim underlying fs metadata
+		 * if the folio is pinned and thus potentially modified by the
+		 * pinning process is that may upset the filesystem.
+		 */
+		if (folio_maybe_dma_pinned(folio))
+			goto activate_locked;
+
  		mapping = folio_mapping(folio);
  		if (folio_test_dirty(folio)) {
  			/*

At this point, we made sure that the folio is completely unmapped. 
However, we specify "TTU_BATCH_FLUSH", so rmap code might defer a TLB 
flush and consequently defer an IPI sync.

I remember that this check here is fine regarding GUP-fast: even if 
concurrent GUP-fast pins the page after our check here, it should 
observe the changed PTE and unpin it again.


Checking after unmapping makes sense: we reduce the likelyhood of false 
positives when a file-backed page is mapped many times (>= 1024). OTOH, 
we might unmap pinned pages because we cannot really detect it early.

For anon pages, we have an early (racy) check, which turned out "ok" in 
practice, because we don't frequently have that many anon pages that are 
shared by that many processes. I assume we don't want something similar 
for pagecache pages, because having a single page mapped by many 
processes can happen easily and would prevent reclaim.

I once had a patch lying around that documented for the existing 
folio_maybe_dma_pinned() for anon pages exactly that (racy+false 
positives with many mappings).


Long story short, I assume this change is fine.

--
Thanks,

David / dhildenb