On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Tue, May 24, 2022 at 03:42:55PM -0700, Zach O'Keefe wrote: > > Hey Matthew, > > > > I'm leading an attempt to add a new madvise mode, MADV_COLLAPSE, to > > allow userspace-directed collapse of memory into THPs[1]. The initial > > proposal only supports anonymous memory, but I'm > > working on adding support for file-backed and shmem memory. > > > > The intended behavior of MADV_COLLAPSE is that it should return > > "success" if all hugepage-aligned / sized regions requested are backed > > by pmd-mapped THPs on return (races aside). IOW: we were able to > > successfully collapse the memory, or it was already backed by > > pmd-mapped THPs. > > > > Currently there is a nice "XXX: khugepaged should compact smaller > > compound pages into a PMD sized page" in khugepaged_scan_file() when > > we encounter a compound page during scanning. Do you know what kind of > > gotchas or technical difficulties would be involved in doing this? I > > presume this work would also benefit those relying on khugepaged to > > collapse read-only file and shmem memory, and I'd be happy to help > > move it forward. Hey Matthew, Thanks for taking the time! > > Hi Zach, > > Thanks for your interest, and I'd love some help on this. > > The khugepaged code (like much of the mm used to) assumes that memory > comes in two sizes, PTE and PMD. That's still true for anon and shmem > for now, but hopefully we'll start managing both anon & shmem memory in > larger chunks, without necessarily going as far as PMD. > > I think the purpose of khugepaged should continue to be to construct > PMD-size pages; I don't see the point of it wandering through process VMs > replacing order-2 pages with order-5 pages. I may be wrong about that, > of course, so feel free to argue with me. I'd agree here. > Anyway, that meaning behind that comment is that the PageTransCompound() > test is going to be true on any compound page (TransCompound doesn't > check that the page is necessarily a THP). So that particular test should > be folio_test_pmd_mappable(), but there are probably other things which > ought to be changed, including converting the entire file from dealing > in pages to dealing in folios. Right, at this point, the page might be a pmd-mapped THP, or it could be a pte-mapped compound page (I'm unsure if we can encounter compound pages outside hugepages). If we could tell it's already pmd-mapped, we're done :) IIUC, folio_test_pmd_mappable() is a necessary but not sufficient condition to determine this. Else, if it's not, is it safe to try and continue? Suppose we find a folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and extend it, or will we break some filesystems that expect a certain order folio? > I actually have one patch which starts in that direction, but I haven't > followed it up yet with all the other patches to that file which will > be needed: Thanks for the head start! Not an expert here, but would you say converting this file to use folios is a necessary first step? Again, thanks for your time, Zach > From a64ac45ad951557103a1040c8bcc3f229022cd26 Mon Sep 17 00:00:00 2001 > From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> > Date: Fri, 7 May 2021 23:40:19 -0400 > Subject: [PATCH] mm/khugepaged: Allocate folios > > khugepaged only wants to deal in terms of folios, so switch to > using the folio allocation functions. This eliminates the calls to > prep_transhuge_page() and saves dozens of bytes of text. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > --- > mm/khugepaged.c | 32 ++++++++++++-------------------- > 1 file changed, 12 insertions(+), 20 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 637bfecd6bf5..ec60ee4e14c9 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -854,18 +854,20 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) > static struct page * > khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > { > + struct folio *folio; > + > VM_BUG_ON_PAGE(*hpage, *hpage); > > - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); > - if (unlikely(!*hpage)) { > + folio = __folio_alloc_node(gfp, HPAGE_PMD_ORDER, node); > + if (unlikely(!folio)) { > count_vm_event(THP_COLLAPSE_ALLOC_FAILED); > *hpage = ERR_PTR(-ENOMEM); > return NULL; > } > > - prep_transhuge_page(*hpage); > count_vm_event(THP_COLLAPSE_ALLOC); > - return *hpage; > + *hpage = &folio->page; > + return &folio->page; > } > #else > static int khugepaged_find_target_node(void) > @@ -873,24 +875,14 @@ static int khugepaged_find_target_node(void) > return 0; > } > > -static inline struct page *alloc_khugepaged_hugepage(void) > -{ > - struct page *page; > - > - page = alloc_pages(alloc_hugepage_khugepaged_gfpmask(), > - HPAGE_PMD_ORDER); > - if (page) > - prep_transhuge_page(page); > - return page; > -} > - > static struct page *khugepaged_alloc_hugepage(bool *wait) > { > - struct page *hpage; > + struct folio *folio; > > do { > - hpage = alloc_khugepaged_hugepage(); > - if (!hpage) { > + folio = folio_alloc(alloc_hugepage_khugepaged_gfpmask(), > + HPAGE_PMD_ORDER); > + if (!folio) { > count_vm_event(THP_COLLAPSE_ALLOC_FAILED); > if (!*wait) > return NULL; > @@ -899,9 +891,9 @@ static struct page *khugepaged_alloc_hugepage(bool *wait) > khugepaged_alloc_sleep(); > } else > count_vm_event(THP_COLLAPSE_ALLOC); > - } while (unlikely(!hpage) && likely(khugepaged_enabled())); > + } while (unlikely(!folio) && likely(khugepaged_enabled())); > > - return hpage; > + return &folio->page; > } > > static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) > -- > 2.34.1 >