Re: [RFC PATCH 1/7] mm/mremap: introduce more mergeable mremap via MREMAP_RELOCATE_ANON

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 22, 2025 at 07:17:05AM +0100, David Hildenbrand wrote:
> On 22.03.25 06:33, David Hildenbrand wrote:
> > On 22.03.25 01:14, Jann Horn wrote:
> > > On Fri, Mar 21, 2025 at 10:54 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@xxxxxxxxxx> wrote:
> > > > diff --git a/mm/mremap.c b/mm/mremap.c
> > > > index 0865387531ed..bb67562a0114 100644
> > > > --- a/mm/mremap.c
> > > > +++ b/mm/mremap.c
> > > [...]
> > > > +/*
> > > > + * If the folio mapped at the specified pte entry can have its index and mapping
> > > > + * relocated, then do so.
> > > > + *
> > > > + * Returns the number of pages we have traversed, or 0 if the operation failed.
> > > > + */
> > > > +static unsigned long relocate_anon(struct pagetable_move_control *pmc,
> > > > +               unsigned long old_addr, unsigned long new_addr, pte_t pte,
> > > > +               bool undo)
> > > > +{
> > > > +       struct page *page;
> > > > +       struct folio *folio;
> > > > +       struct vm_area_struct *old, *new;
> > > > +       pgoff_t new_index;
> > > > +       unsigned long ret = 1;
> > > > +
> > > > +       old = pmc->old;
> > > > +       new = pmc->new;
> > > > +
> > > > +       /* Ensure we have truly got an anon folio. */
> > > > +       page = vm_normal_page(old, old_addr, pte);
> > > > +       if (!page)
> > > > +               return ret;
> > > > +       folio = page_folio(page);
> > > > +       folio_lock(folio);
> > > > +
> > > > +       /* no-op. */
> > > > +       if (!folio_test_anon(folio) || folio_test_ksm(folio))
> > > > +               goto out;
> > > > +
> > > > +       /*
> > > > +        * This should not happen as we explicitly disallow this, but check
> > > > +        * anyway.
> > > > +        */
> > > > +       if (folio_test_large(folio)) {
> > > > +               ret = 0;
> > > > +               goto out;
> > > > +       }
> > >
> > > Do I understand correctly that you assume here that the page is
> > > exclusively mapped? Maybe we could at least
> > > WARN_ON(folio_mapcount(folio) != 1) or something like that?
> > >
> > > (I was also wondering if the PageAnonExclusive bit is somehow
> > > relevant, but we should probably not look at or touch that here,
> > > unless we want to think about cases where we _used to_ have a child
> > > from which the page may have been GUP'd...)
> >
> > UFFDIO_MOVE implements something similar. Right now we keep it simple:
> >
> > 	if (folio_test_large(src_folio) ||
> > 	    folio_maybe_dma_pinned(src_folio) ||
> > 	    !PageAnonExclusive(&src_folio->page)) {
> > 		err = -EBUSY;
> > 		goto out;
> > 	}
> >
> > Whereby we
> >
> > a) Make sure we cover all PTEs (-> small folio, single PTE). Large
> > PTE-mapped folios are split.
> >
> > b) Make sure there are no GUP pins (maybe not required here?)
> >
> > c) The folio is exclusive to this process
>
> On additional note as my memory comes back: if PAE is set, there cannot be
> other (inactive) mappings from the swapcache. So whenever we use folio lock
> + mapcount data, the possibility of the swapcache (having inactive mappings
> from other processes etc.) must be considered.

Ack, do you have a feel for how such a check would work?

>
> --
> Cheers,
>
> David / dhildenb
>

An aside in general:

I realise all of this is very fiddly, this (albeit early) RFC is several
revisions deep and fully expect considerably more fiddly cases to come up :)

I plan to do some deeper testing on real iron running very heavy workloads to
encourage reclaim and races btw.

One test I've done in past is to just hack in forced-on MREMAP_RELOCATE_ANON so
_all_ mremap()'s that move do it, which has been a good way of finding issues,
but also in tests I add in series I intentionally trigger reclaim via
MADV_PAGEOUT.

So in general - I'm going to proactively try to really eek out all and any weird
edge case stuff where possible before more seriously pushing this series
forward.

Thanks again for early review, it's much appreciated! :) and see you at the
topic I'm giving on this and the pub after... ;)




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux