Re: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Mon, 14 Aug 2023 20:06:12 +0100

On Mon, Aug 14, 2023 at 11:47:50AM -0700, Zach O'Keefe wrote:
> Willy -- I'm not up-to-date on what is happening on the THP-fs front.
> Should we be checking for a ->huge_fault handler here?

Oh, thank goodness, I thought you were cc'ing me to ask a DAX question ...

>From a large folios perspective, filesystems do not implement a special
handler.  They call filemap_fault() (directly or indirectly) from their
->fault handler.  If there is already a folio in the page cache which
satisfies this fault, we insert it into the page tables (no matter what
size it is).  If there is no folio, we call readahead to populate that
index in the page cache, and probably some other indices around it.
That's do_sync_mmap_readahead().

If you look at that, you'll see that we check the VM_HUGEPAGE flag, and
if set we align to a PMD boundary and read two PMD-size pages (so that we
can do async readahead for the second page, if we're doing a linear scan).
If the VM_HUGEPAGE flag isn't set, we'll use the readahead algorithm to
decide how large the folio should be that we're reading into; if it's a
random read workload, we'll stick to order-0 pages, but if we're getting
good hit rate from the linear scan, we'll increase the size (although
we won't go past PMD size)

There's also the ->map_pages() optimisation which handles page faults
locklessly, and will fail back to ->fault() if there's even a light
breeze.  I don't think that's of any particular use in answering your
question, so I'm not going into details about it.

I'm not sure I understand the code that's being modified well enough to
be able to give you a straight answer to your question, but hopefully
this is helpful to you.