On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote: > On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > The khugepaged code (like much of the mm used to) assumes that memory > > comes in two sizes, PTE and PMD. That's still true for anon and shmem > > for now, but hopefully we'll start managing both anon & shmem memory in > > larger chunks, without necessarily going as far as PMD. > > > > I think the purpose of khugepaged should continue to be to construct > > PMD-size pages; I don't see the point of it wandering through process VMs > > replacing order-2 pages with order-5 pages. I may be wrong about that, > > of course, so feel free to argue with me. > > I'd agree here. > > > Anyway, that meaning behind that comment is that the PageTransCompound() > > test is going to be true on any compound page (TransCompound doesn't > > check that the page is necessarily a THP). So that particular test should > > be folio_test_pmd_mappable(), but there are probably other things which > > ought to be changed, including converting the entire file from dealing > > in pages to dealing in folios. > > Right, at this point, the page might be a pmd-mapped THP, or it could > be a pte-mapped compound page (I'm unsure if we can encounter compound > pages outside hugepages). Today, there is a way. We can find a folio with an order between 0 and PMD_ORDER if the underlying filesystem supports large folios and the file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS. In this case, we'll simply skip over it because the code believes that means it's already a PMD. > If we could tell it's already pmd-mapped, we're done :) IIUC, > folio_test_pmd_mappable() is a necessary but not sufficient condition > to determine this. It is necessary, but from khugepaged's point of view, it's sufficient because khugepaged's job is to create PMD-sized folios -- it's not up to khugepaged to ensure that PMD-sized folios are actually mapped using a PMD. There may be some other component of the system (eg DAMON?) which has chosen to temporarily map the PMD-sized folio using PTEs in order to track whether the memory is all being used. It may also be the case that (for file-based memory), the VMA is mis-aligned and despite creating a PMD-sized folio, it can't be mapped with a PMD. > Else, if it's not, is it safe to try and continue? Suppose we find a > folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and > extend it, or will we break some filesystems that expect a certain > order folio? We're not giving filesystems the opportunity to request that ;-) Filesystems are expected to handle folios of arbitrary order (if they claim the ability to support large folios at all). In practice, I've capped the folio creation size at PMD_ORDER (because I don't want to track down all the places that assume pmd_page() is necessarily a head page), but filesystems shouldn't rely on it. shmem still expects folios to be of order either 0 or PMD_ORDER. That assumption extends into the swap code and I haven't had the heart to go and fix all those places yet. Plus Neil was doing major surgery to the swap code in the most recent deveopment cycle and I didn't want to get in his way. So I am absolutely fine with khugepaged allocating a PMD-size folio for any inode that claims mapping_large_folio_support(). If any filesystems break, we'll fix them. > > I actually have one patch which starts in that direction, but I haven't > > followed it up yet with all the other patches to that file which will > > be needed: > > Thanks for the head start! Not an expert here, but would you say > converting this file to use folios is a necessary first step? Not _necessary_, but I find it helps keep things clearer. Plus it's something that needs to happen anyway.