Re: [PATCH 1/5] mm: Do not reclaim private data from pinned page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09.02.23 13:31, Jan Kara wrote:
If the page is pinned, there's no point in trying to reclaim it.
Furthermore if the page is from the page cache we don't want to reclaim
fs-private data from the page because the pinning process may be writing
to the page at any time and reclaiming fs private info on a dirty page
can upset the filesystem (see link below).

Link: https://lore.kernel.org/linux-mm/20180103100430.GE4911@xxxxxxxxxxxxxx
Signed-off-by: Jan Kara <jack@xxxxxxx>
---
  mm/vmscan.c | 10 ++++++++++
  1 file changed, 10 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bf3eedf0209c..ab3911a8b116 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1901,6 +1901,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
  			}
  		}
+ /*
+		 * Folio is unmapped now so it cannot be newly pinned anymore.
+		 * No point in trying to reclaim folio if it is pinned.
+		 * Furthermore we don't want to reclaim underlying fs metadata
+		 * if the folio is pinned and thus potentially modified by the
+		 * pinning process is that may upset the filesystem.
+		 */
+		if (folio_maybe_dma_pinned(folio))
+			goto activate_locked;
+
  		mapping = folio_mapping(folio);
  		if (folio_test_dirty(folio)) {
  			/*

At this point, we made sure that the folio is completely unmapped. However, we specify "TTU_BATCH_FLUSH", so rmap code might defer a TLB flush and consequently defer an IPI sync.

I remember that this check here is fine regarding GUP-fast: even if concurrent GUP-fast pins the page after our check here, it should observe the changed PTE and unpin it again.


Checking after unmapping makes sense: we reduce the likelyhood of false positives when a file-backed page is mapped many times (>= 1024). OTOH, we might unmap pinned pages because we cannot really detect it early.

For anon pages, we have an early (racy) check, which turned out "ok" in practice, because we don't frequently have that many anon pages that are shared by that many processes. I assume we don't want something similar for pagecache pages, because having a single page mapped by many processes can happen easily and would prevent reclaim.

I once had a patch lying around that documented for the existing folio_maybe_dma_pinned() for anon pages exactly that (racy+false positives with many mappings).


Long story short, I assume this change is fine.

--
Thanks,

David / dhildenb




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux