Hey Matthew, Thanks for the response! On Wed, May 25, 2022 at 8:36 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Wed, May 25, 2022 at 06:23:52PM -0700, Zach O'Keefe wrote: > > On Wed, May 25, 2022 at 12:07 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > The khugepaged code (like much of the mm used to) assumes that memory > > > comes in two sizes, PTE and PMD. That's still true for anon and shmem > > > for now, but hopefully we'll start managing both anon & shmem memory in > > > larger chunks, without necessarily going as far as PMD. > > > > > > I think the purpose of khugepaged should continue to be to construct > > > PMD-size pages; I don't see the point of it wandering through process VMs > > > replacing order-2 pages with order-5 pages. I may be wrong about that, > > > of course, so feel free to argue with me. > > > > I'd agree here. > > > > > Anyway, that meaning behind that comment is that the PageTransCompound() > > > test is going to be true on any compound page (TransCompound doesn't > > > check that the page is necessarily a THP). So that particular test should > > > be folio_test_pmd_mappable(), but there are probably other things which > > > ought to be changed, including converting the entire file from dealing > > > in pages to dealing in folios. > > > > Right, at this point, the page might be a pmd-mapped THP, or it could > > be a pte-mapped compound page (I'm unsure if we can encounter compound > > pages outside hugepages). > > Today, there is a way. We can find a folio with an order between 0 and > PMD_ORDER if the underlying filesystem supports large folios and the > file is executable and we've enabled CONFIG_READ_ONLY_THP_FOR_FS. > In this case, we'll simply skip over it because the code believes that > means it's already a PMD. I think I'm missing something here - sorry. If the folio order is < HPAGE_PMD_ORDER, why does the code think it's a pmd? > > If we could tell it's already pmd-mapped, we're done :) IIUC, > > folio_test_pmd_mappable() is a necessary but not sufficient condition > > to determine this. > > It is necessary, but from khugepaged's point of view, it's sufficient > because khugepaged's job is to create PMD-sized folios -- it's not up to > khugepaged to ensure that PMD-sized folios are actually mapped using > a PMD. I thought the point / benefit of khugepaged was precisely to try and find places where we can collapse many pte entries into a single pmd mapping? > There may be some other component of the system (eg DAMON?) > which has chosen to temporarily map the PMD-sized folio using PTEs > in order to track whether the memory is all being used. It may also > be the case that (for file-based memory), the VMA is mis-aligned and > despite creating a PMD-sized folio, it can't be mapped with a PMD. AFAIK DAMON doesn't do this pmd splitting to do subpage tracking for THPs. Also, I believe retract_page_tables() does make the check to see if the address is suitably hugepage aligned/sized. > > Else, if it's not, is it safe to try and continue? Suppose we find a > > folio of 0 < order < HPAGE_PMD_ORDER. Are we safely able to try and > > extend it, or will we break some filesystems that expect a certain > > order folio? > > We're not giving filesystems the opportunity to request that ;-) > Filesystems are expected to handle folios of arbitrary order (if they > claim the ability to support large folios at all). In practice, I've > capped the folio creation size at PMD_ORDER (because I don't want to track > down all the places that assume pmd_page() is necessarily a head page), > but filesystems shouldn't rely on it. Ok, that's good to hear :) > shmem still expects folios to be of order either 0 or PMD_ORDER. > That assumption extends into the swap code and I haven't had the heart > to go and fix all those places yet. Plus Neil was doing major surgery > to the swap code in the most recent deveopment cycle and I didn't want > to get in his way. > > So I am absolutely fine with khugepaged allocating a PMD-size folio for > any inode that claims mapping_large_folio_support(). If any filesystems > break, we'll fix them. Just for clarification, what is the equivalent code today that enforces mapping_large_folio_support()? I.e. today, khugepaged can successfully collapse file without checking if the inode supports it (we only check that it's a regular file not opened for writing). Also, just to check, there isn't anything wrong with following collapse_file()'s approach, even for folios of 0 < order < HPAGE_PMD_ORDER? I.e this part: * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; * - scan page cache replacing old pages with the new one * + swap/gup in pages if necessary; * + fill in gaps; * + keep old pages around in case rollback is required; * - if replacing succeeds: * + copy data over; * + free old pages; * + unlock huge page; * - if replacing failed; * + put all pages back and unfreeze them; * + restore gaps in the page cache; * + unlock and free huge page; */ > > > I actually have one patch which starts in that direction, but I haven't > > > followed it up yet with all the other patches to that file which will > > > be needed: > > > > Thanks for the head start! Not an expert here, but would you say > > converting this file to use folios is a necessary first step? > > Not _necessary_, but I find it helps keep things clearer. Plus it's > something that needs to happen anyway. Got it. Perhaps something I can help with then. Thanks again for your time advising on this! Best, Zach