Re: mm/khugepaged: collapse file/shmem compound pages

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 26 May 2022 04:36:04 +0100

On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote:
> On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > The khugepaged code (like much of the mm used to) assumes that memory
> > comes in two sizes, PTE and PMD.  That's still true for anon and shmem
> > for now, but hopefully we'll start managing both anon & shmem memory in
> > larger chunks, without necessarily going as far as PMD.
> >
> > I think the purpose of khugepaged should continue to be to construct
> > PMD-size pages; I don't see the point of it wandering through process VMs
> > replacing order-2 pages with order-5 pages.  I may be wrong about that,
> > of course, so feel free to argue with me.
> 
> I'd agree here.
> 
> > Anyway, that meaning behind that comment is that the PageTransCompound()
> > test is going to be true on any compound page (TransCompound doesn't
> > check that the page is necessarily a THP).  So that particular test should
> > be folio_test_pmd_mappable(), but there are probably other things which
> > ought to be changed, including converting the entire file from dealing
> > in pages to dealing in folios.
> 
> Right, at this point, the page might be a pmd-mapped THP, or it could
> be a pte-mapped compound page (I'm unsure if we can encounter compound
> pages outside hugepages).

Today, there is a way.  We can find a folio with an order between 0 and
PMD_ORDER if the underlying filesystem supports large folios and the
file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS.
In this case, we'll simply skip over it because the code believes that
means it's already a PMD.

> If we could tell it's already pmd-mapped, we're done :) IIUC,
> folio_test_pmd_mappable() is a necessary but not sufficient condition
> to determine this.

It is necessary, but from khugepaged's point of view, it's sufficient
because khugepaged's job is to create PMD-sized folios -- it's not up to
khugepaged to ensure that PMD-sized folios are actually mapped using
a PMD.  There may be some other component of the system (eg DAMON?)
which has chosen to temporarily map the PMD-sized folio using PTEs
in order to track whether the memory is all being used.  It may also
be the case that (for file-based memory), the VMA is mis-aligned and
despite creating a PMD-sized folio, it can't be mapped with a PMD.

> Else, if it's not, is it safe to try and continue? Suppose we find a
> folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and
> extend it, or will we break some filesystems that expect a certain
> order folio?

We're not giving filesystems the opportunity to request that ;-)
Filesystems are expected to handle folios of arbitrary order (if they
claim the ability to support large folios at all).  In practice, I've
capped the folio creation size at PMD_ORDER (because I don't want to track
down all the places that assume pmd_page() is necessarily a head page),
but filesystems shouldn't rely on it.

shmem still expects folios to be of order either 0 or PMD_ORDER.
That assumption extends into the swap code and I haven't had the heart
to go and fix all those places yet.  Plus Neil was doing major surgery
to the swap code in the most recent deveopment cycle and I didn't want
to get in his way.

So I am absolutely fine with khugepaged allocating a PMD-size folio for
any inode that claims mapping_large_folio_support().  If any filesystems
break, we'll fix them.

> > I actually have one patch which starts in that direction, but I haven't
> > followed it up yet with all the other patches to that file which will
> > be needed:
> 
> Thanks for the head start! Not an expert here, but would you say
> converting this file to use folios is a necessary first step?

Not _necessary_, but I find it helps keep things clearer.  Plus it's
something that needs to happen anyway.