Thanks for early review, pints_owed++ :>) On Sat, Mar 22, 2025 at 06:33:22AM +0100, David Hildenbrand wrote: > On 22.03.25 01:14, Jann Horn wrote: > > On Fri, Mar 21, 2025 at 10:54 PM Lorenzo Stoakes > > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > index 0865387531ed..bb67562a0114 100644 > > > --- a/mm/mremap.c > > > +++ b/mm/mremap.c > > [...] > > > +/* > > > + * If the folio mapped at the specified pte entry can have its index and mapping > > > + * relocated, then do so. > > > + * > > > + * Returns the number of pages we have traversed, or 0 if the operation failed. > > > + */ > > > +static unsigned long relocate_anon(struct pagetable_move_control *pmc, > > > + unsigned long old_addr, unsigned long new_addr, pte_t pte, > > > + bool undo) > > > +{ > > > + struct page *page; > > > + struct folio *folio; > > > + struct vm_area_struct *old, *new; > > > + pgoff_t new_index; > > > + unsigned long ret = 1; > > > + > > > + old = pmc->old; > > > + new = pmc->new; > > > + > > > + /* Ensure we have truly got an anon folio. */ > > > + page = vm_normal_page(old, old_addr, pte); > > > + if (!page) > > > + return ret; > > > + folio = page_folio(page); > > > + folio_lock(folio); > > > + > > > + /* no-op. */ > > > + if (!folio_test_anon(folio) || folio_test_ksm(folio)) > > > + goto out; > > > + > > > + /* > > > + * This should not happen as we explicitly disallow this, but check > > > + * anyway. > > > + */ > > > + if (folio_test_large(folio)) { > > > + ret = 0; > > > + goto out; > > > + } > > > > Do I understand correctly that you assume here that the page is > > exclusively mapped? Maybe we could at least > > WARN_ON(folio_mapcount(folio) != 1) or something like that? > > > > (I was also wondering if the PageAnonExclusive bit is somehow > > relevant, but we should probably not look at or touch that here, > > unless we want to think about cases where we _used to_ have a child > > from which the page may have been GUP'd...) > > UFFDIO_MOVE implements something similar. Right now we keep it simple: > > if (folio_test_large(src_folio) || > folio_maybe_dma_pinned(src_folio) || > !PageAnonExclusive(&src_folio->page)) { > err = -EBUSY; > goto out; > } > > Whereby we > > a) Make sure we cover all PTEs (-> small folio, single PTE). Large > PTE-mapped folios are split. > > b) Make sure there are no GUP pins (maybe not required here?) > > c) The folio is exclusive to this process Yeah, later I actually add handling for large folios :) but this is something separate. The maybe dma pinned thing is a thing and probably I need to add this. Will do so. > > > In general, things I can reason about with confidence are: > > a) As alternative to PageAnonExclusive(), we can check folio_mapcount()==1 > under the folio lock for small folios / PMD-mapped folios. As you (Jann) > say, there might be unexpected references on the folio from other processes. Ack for sure will add. > > b) For large (pte-mapped) folios, we could try batching multiple PTEs > (folio_pte_batch() etc.). We'd be processing all mappings with folio_lock + > folio_mapcount() == #PTEs. Interesting, hadn't thought about this, maybe we can discuss at LSF? > > c) In -next, there is now be the option to use folio lock + > folio_maybe_mapped_shared() == false. But it doesn't tell you into how many > VMAs a large folio is mapped into. > > In the following case: > > [ folio ] > [ VMA#1 ] [ VMA#2 ] > > c) would not tell you if you are fine modifying the folio when moving VMA#2. Right, I feel like prior checks made should assert this is not the case, however? But mapcount check should be a last ditch assurance? (actually at least one of the 'prior checks' for large folios are added in a later commit but still :P) > > > -- > Cheers, > > David / dhildenb >