Re: mm/khugepaged: collapse file/shmem compound pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Tue, May 24, 2022 at 03:42:55PM -0700, Zach O'Keefe wrote:
> > Hey Matthew,
> >
> > I'm leading an attempt to add a new madvise mode, MADV_COLLAPSE, to
> > allow userspace-directed collapse of memory into THPs[1]. The initial
> > proposal only supports anonymous memory, but I'm
> > working on adding support for file-backed and shmem memory.
> >
> > The intended behavior of MADV_COLLAPSE is that it should return
> > "success" if all hugepage-aligned / sized regions requested are backed
> > by pmd-mapped THPs on return (races aside). IOW: we were able to
> > successfully collapse the memory, or it was already backed by
> > pmd-mapped THPs.
> >
> > Currently there is a nice "XXX: khugepaged should compact smaller
> > compound pages into a PMD sized page" in khugepaged_scan_file() when
> > we encounter a compound page during scanning. Do you know what kind of
> > gotchas or technical difficulties would be involved in doing this? I
> > presume this work would also benefit those relying on khugepaged to
> > collapse read-only file and shmem memory, and I'd be happy to help
> > move it forward.

Hey Matthew,

Thanks for taking the time!

>
> Hi Zach,
>
> Thanks for your interest, and I'd love some help on this.
>
> The khugepaged code (like much of the mm used to) assumes that memory
> comes in two sizes, PTE and PMD.  That's still true for anon and shmem
> for now, but hopefully we'll start managing both anon & shmem memory in
> larger chunks, without necessarily going as far as PMD.
>
> I think the purpose of khugepaged should continue to be to construct
> PMD-size pages; I don't see the point of it wandering through process VMs
> replacing order-2 pages with order-5 pages.  I may be wrong about that,
> of course, so feel free to argue with me.

I'd agree here.

> Anyway, that meaning behind that comment is that the PageTransCompound()
> test is going to be true on any compound page (TransCompound doesn't
> check that the page is necessarily a THP).  So that particular test should
> be folio_test_pmd_mappable(), but there are probably other things which
> ought to be changed, including converting the entire file from dealing
> in pages to dealing in folios.

Right, at this point, the page might be a pmd-mapped THP, or it could
be a pte-mapped compound page (I'm unsure if we can encounter compound
pages outside hugepages).

If we could tell it's already pmd-mapped, we're done :) IIUC,
folio_test_pmd_mappable() is a necessary but not sufficient condition
to determine this.

Else, if it's not, is it safe to try and continue? Suppose we find a
folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and
extend it, or will we break some filesystems that expect a certain
order folio?

> I actually have one patch which starts in that direction, but I haven't
> followed it up yet with all the other patches to that file which will
> be needed:

Thanks for the head start! Not an expert here, but would you say
converting this file to use folios is a necessary first step?

Again, thanks for your time,
Zach

> From a64ac45ad951557103a1040c8bcc3f229022cd26 Mon Sep 17 00:00:00 2001
> From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
> Date: Fri, 7 May 2021 23:40:19 -0400
> Subject: [PATCH] mm/khugepaged: Allocate folios
>
> khugepaged only wants to deal in terms of folios, so switch to
> using the folio allocation functions.  This eliminates the calls to
> prep_transhuge_page() and saves dozens of bytes of text.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> ---
>  mm/khugepaged.c | 32 ++++++++++++--------------------
>  1 file changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 637bfecd6bf5..ec60ee4e14c9 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -854,18 +854,20 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
>  static struct page *
>  khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
>  {
> +       struct folio *folio;
> +
>         VM_BUG_ON_PAGE(*hpage, *hpage);
>
> -       *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER);
> -       if (unlikely(!*hpage)) {
> +       folio = __folio_alloc_node(gfp, HPAGE_PMD_ORDER, node);
> +       if (unlikely(!folio)) {
>                 count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
>                 *hpage = ERR_PTR(-ENOMEM);
>                 return NULL;
>         }
>
> -       prep_transhuge_page(*hpage);
>         count_vm_event(THP_COLLAPSE_ALLOC);
> -       return *hpage;
> +       *hpage = &folio->page;
> +       return &folio->page;
>  }
>  #else
>  static int khugepaged_find_target_node(void)
> @@ -873,24 +875,14 @@ static int khugepaged_find_target_node(void)
>         return 0;
>  }
>
> -static inline struct page *alloc_khugepaged_hugepage(void)
> -{
> -       struct page *page;
> -
> -       page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(),
> -                          HPAGE_PMD_ORDER);
> -       if (page)
> -               prep_transhuge_page(page);
> -       return page;
> -}
> -
>  static struct page *khugepaged_alloc_hugepage(bool *wait)
>  {
> -       struct page *hpage;
> +       struct folio *folio;
>
>         do {
> -               hpage = alloc_khugepaged_hugepage();
> -               if (!hpage) {
> +               folio = folio_alloc(alloc_hugepage_khugepaged_gfpmask(),
> +                                       HPAGE_PMD_ORDER);
> +               if (!folio) {
>                         count_vm_event(THP_COLLAPSE_ALLOC_FAILED);
>                         if (!*wait)
>                                 return NULL;
> @@ -899,9 +891,9 @@ static struct page *khugepaged_alloc_hugepage(bool *wait)
>                         khugepaged_alloc_sleep();
>                 } else
>                         count_vm_event(THP_COLLAPSE_ALLOC);
> -       } while (unlikely(!hpage) && likely(khugepaged_enabled()));
> +       } while (unlikely(!folio) && likely(khugepaged_enabled()));
>
> -       return hpage;
> +       return &folio->page;
>  }
>
>  static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
> --
> 2.34.1
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux