On Mon, Aug 14, 2023 at 11:47:50AM -0700, Zach O'Keefe wrote: > Willy -- I'm not up-to-date on what is happening on the THP-fs front. > Should we be checking for a ->huge_fault handler here? Oh, thank goodness, I thought you were cc'ing me to ask a DAX question ... >From a large folios perspective, filesystems do not implement a special handler. They call filemap_fault() (directly or indirectly) from their ->fault handler. If there is already a folio in the page cache which satisfies this fault, we insert it into the page tables (no matter what size it is). If there is no folio, we call readahead to populate that index in the page cache, and probably some other indices around it. That's do_sync_mmap_readahead(). If you look at that, you'll see that we check the VM_HUGEPAGE flag, and if set we align to a PMD boundary and read two PMD-size pages (so that we can do async readahead for the second page, if we're doing a linear scan). If the VM_HUGEPAGE flag isn't set, we'll use the readahead algorithm to decide how large the folio should be that we're reading into; if it's a random read workload, we'll stick to order-0 pages, but if we're getting good hit rate from the linear scan, we'll increase the size (although we won't go past PMD size) There's also the ->map_pages() optimisation which handles page faults locklessly, and will fail back to ->fault() if there's even a light breeze. I don't think that's of any particular use in answering your question, so I'm not going into details about it. I'm not sure I understand the code that's being modified well enough to be able to give you a straight answer to your question, but hopefully this is helpful to you.