Re: [RFC PATCH 1/7] mm/mremap: introduce more mergeable mremap via MREMAP_RELOCATE_ANON

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 22.03.25 01:14, Jann Horn wrote:
On Fri, Mar 21, 2025 at 10:54 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
diff --git a/mm/mremap.c b/mm/mremap.c
index 0865387531ed..bb67562a0114 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
[...]
+/*
+ * If the folio mapped at the specified pte entry can have its index and mapping
+ * relocated, then do so.
+ *
+ * Returns the number of pages we have traversed, or 0 if the operation failed.
+ */
+static unsigned long relocate_anon(struct pagetable_move_control *pmc,
+               unsigned long old_addr, unsigned long new_addr, pte_t pte,
+               bool undo)
+{
+       struct page *page;
+       struct folio *folio;
+       struct vm_area_struct *old, *new;
+       pgoff_t new_index;
+       unsigned long ret = 1;
+
+       old = pmc->old;
+       new = pmc->new;
+
+       /* Ensure we have truly got an anon folio. */
+       page = vm_normal_page(old, old_addr, pte);
+       if (!page)
+               return ret;
+       folio = page_folio(page);
+       folio_lock(folio);
+
+       /* no-op. */
+       if (!folio_test_anon(folio) || folio_test_ksm(folio))
+               goto out;
+
+       /*
+        * This should not happen as we explicitly disallow this, but check
+        * anyway.
+        */
+       if (folio_test_large(folio)) {
+               ret = 0;
+               goto out;
+       }

Do I understand correctly that you assume here that the page is
exclusively mapped? Maybe we could at least
WARN_ON(folio_mapcount(folio) != 1) or something like that?

(I was also wondering if the PageAnonExclusive bit is somehow
relevant, but we should probably not look at or touch that here,
unless we want to think about cases where we _used to_ have a child
from which the page may have been GUP'd...)

UFFDIO_MOVE implements something similar. Right now we keep it simple:

	if (folio_test_large(src_folio) ||
	    folio_maybe_dma_pinned(src_folio) ||
	    !PageAnonExclusive(&src_folio->page)) {
		err = -EBUSY;
		goto out;
	}

Whereby we

a) Make sure we cover all PTEs (-> small folio, single PTE). Large PTE-mapped folios are split.

b) Make sure there are no GUP pins (maybe not required here?)

c) The folio is exclusive to this process


In general, things I can reason about with confidence are:

a) As alternative to PageAnonExclusive(), we can check folio_mapcount()==1 under the folio lock for small folios / PMD-mapped folios. As you (Jann) say, there might be unexpected references on the folio from other processes.

b) For large (pte-mapped) folios, we could try batching multiple PTEs (folio_pte_batch() etc.). We'd be processing all mappings with folio_lock + folio_mapcount() == #PTEs.

c) In -next, there is now be the option to use folio lock + folio_maybe_mapped_shared() == false. But it doesn't tell you into how many VMAs a large folio is mapped into.

In the following case:

[       folio     ]
[ VMA#1 ] [ VMA#2 ]

c) would not tell you if you are fine modifying the folio when moving VMA#2.


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux