On 22.03.25 06:33, David Hildenbrand wrote:
On 22.03.25 01:14, Jann Horn wrote:
On Fri, Mar 21, 2025 at 10:54 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
diff --git a/mm/mremap.c b/mm/mremap.c
index 0865387531ed..bb67562a0114 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
[...]
+/*
+ * If the folio mapped at the specified pte entry can have its index and mapping
+ * relocated, then do so.
+ *
+ * Returns the number of pages we have traversed, or 0 if the operation failed.
+ */
+static unsigned long relocate_anon(struct pagetable_move_control *pmc,
+ unsigned long old_addr, unsigned long new_addr, pte_t pte,
+ bool undo)
+{
+ struct page *page;
+ struct folio *folio;
+ struct vm_area_struct *old, *new;
+ pgoff_t new_index;
+ unsigned long ret = 1;
+
+ old = pmc->old;
+ new = pmc->new;
+
+ /* Ensure we have truly got an anon folio. */
+ page = vm_normal_page(old, old_addr, pte);
+ if (!page)
+ return ret;
+ folio = page_folio(page);
+ folio_lock(folio);
+
+ /* no-op. */
+ if (!folio_test_anon(folio) || folio_test_ksm(folio))
+ goto out;
+
+ /*
+ * This should not happen as we explicitly disallow this, but check
+ * anyway.
+ */
+ if (folio_test_large(folio)) {
+ ret = 0;
+ goto out;
+ }
Do I understand correctly that you assume here that the page is
exclusively mapped? Maybe we could at least
WARN_ON(folio_mapcount(folio) != 1) or something like that?
(I was also wondering if the PageAnonExclusive bit is somehow
relevant, but we should probably not look at or touch that here,
unless we want to think about cases where we _used to_ have a child
from which the page may have been GUP'd...)
UFFDIO_MOVE implements something similar. Right now we keep it simple:
if (folio_test_large(src_folio) ||
folio_maybe_dma_pinned(src_folio) ||
!PageAnonExclusive(&src_folio->page)) {
err = -EBUSY;
goto out;
}
Whereby we
a) Make sure we cover all PTEs (-> small folio, single PTE). Large
PTE-mapped folios are split.
b) Make sure there are no GUP pins (maybe not required here?)
c) The folio is exclusive to this process
On additional note as my memory comes back: if PAE is set, there cannot
be other (inactive) mappings from the swapcache. So whenever we use
folio lock + mapcount data, the possibility of the swapcache (having
inactive mappings from other processes etc.) must be considered.
--
Cheers,
David / dhildenb