Re: [v2 PATCH 0/8] Make khugepaged collapse readonly FS THP more consistent

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 18 Mar 2022 12:29:48 +1100

On Thu, Mar 17, 2022 at 04:48:19PM -0700, Yang Shi wrote:
> 
> Changelog
> v2: * Collected reviewed-by tags from Miaohe Lin.
>     * Fixed build error for patch 4/8.
> 
> The readonly FS THP relies on khugepaged to collapse THP for suitable
> vmas.  But it is kind of "random luck" for khugepaged to see the
> readonly FS vmas (see report: https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@xxxxxxx/) since currently the vmas are registered to khugepaged when:
>   - Anon huge pmd page fault
>   - VMA merge
>   - MADV_HUGEPAGE
>   - Shmem mmap
> 
> If the above conditions are not met, even though khugepaged is enabled
> it won't see readonly FS vmas at all.  MADV_HUGEPAGE could be specified
> explicitly to tell khugepaged to collapse this area, but when khugepaged
> mode is "always" it should scan suitable vmas as long as VM_NOHUGEPAGE
> is not set.
> 
> So make sure readonly FS vmas are registered to khugepaged to make the
> behavior more consistent.
> 
> Registering the vmas in mmap path seems more preferred from performance
> point of view since page fault path is definitely hot path.
> 
> 
> The patch 1 ~ 7 are minor bug fixes, clean up and preparation patches.
> The patch 8 converts ext4 and xfs.  We may need convert more filesystems,
> but I'd like to hear some comments before doing that.

After reading through the patchset, I have no idea what this is even
doing or enabling. I can't comment on the last patch and it's effect
on XFS because there's no high level explanation of the
functionality or feature to provide me with the context in which I
should be reviewing this patchset.

I understand this has something to do with hugepages, but there's no
explaination of exactly where huge pages are going to be used in the
filesystem, what the problems with khugepaged and filesystems are
that this apparently solves, what constraints it places on
filesystems to enable huge pages to be used, etc.

I'm guessing that the result is that we'll suddenly see huge pages
in the page cache for some undefined set of files in some undefined
set of workloads. But that doesn't help me understand any of the
impacts it may have. e.g:

- how does this relate to the folio conversion and use of large
  pages in the page cache?
- why do we want two completely separate large page mechanisms in
  the page cache?
- why is this limited to "read only VMAs" and how does the
  filesystem actually ensure that the VMAs are read only?
- what happens if we have a file that huge pages mapped into the
  page cache via read only VMAs then has write() called on it via a
  different file descriptor and so we need to dirty the page cache
  that has huge pages in it?

I've got a lot more questions, but to save me having to ask them,
how about you explain what this new functionality actually does, why
we need to support it, and why it is better than the fully writeable
huge page support via folios that we already have in the works...

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx