On Fri, May 27, 2022 at 09:27:33AM -0700, Zach O'Keefe wrote: > On Thu, May 26, 2022 at 8:47 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > Because PageTransCompound() does not do what it says on the tin. > > > > static inline int PageTransCompound(struct page *page) > > { > > return PageCompound(page); > > } > > > > So any compound page is treated as if it's a PMD-sized page. > > Right - therein lies the problem :) I think I misattributed your > comment "we'll simply skip over it because the code believes that > means it's already a PMD" as a solution, not as the current state of > things. What we need to be able to do is: > > 1) If folio order == 0: do what we've been doing > 2) If folio order == HPAGE_PMD_ORDER: check if it's _actually_ > pmd-mapped. If it is, we're done. If not, continue to step (3) I would not do that part. Just leave it alone and assume everything's good. > 3) Else (folio order > 0 and not pmd-mapped): new magic; hopefully > it's ~ same as step (1) Yes, exactly this. > > > I thought the point / benefit of khugepaged was precisely to try and > > > find places where we can collapse many pte entries into a single pmd > > > mapping? > > > > Ideally, yes. But if a file is mapped at an address which isn't > > PMD-aligned, it can't. Maybe it should just decline to operate in that > > case. > > To make sure I'm not missing anything here: It's not actually > important that the file is mapped at a pmd-aligned address. All that > is important is that the region of memory being collapsed is > pmd-aligned. If we wanted to collapse memory mapped to the start of > the file, then sure, the file has to be mapped suitably. Ah, what you're probably missing is that for file pages/folios, they have to be naturally aligned. The data structure underlying the page cache simply can't cope with askew pages. (It kind of can under some circumstances that are so complicated that they shouldn't be explained, and it's far easier just to say "folios must be naturally aligned within the file") > > > > shmem still expects folios to be of order either 0 or PMD_ORDER. > > > > That assumption extends into the swap code and I haven't had the heart > > > > to go and fix all those places yet. Plus Neil was doing major surgery > > > > to the swap code in the most recent deveopment cycle and I didn't want > > > > to get in his way. > > > > > > > > So I am absolutely fine with khugepaged allocating a PMD-size folio for > > > > any inode that claims mapping_large_folio_support(). If any filesystems > > > > break, we'll fix them. > > > > > > Just for clarification, what is the equivalent code today that > > > enforces mapping_large_folio_support()? I.e. today, khugepaged can > > > successfully collapse file without checking if the inode supports it > > > (we only check that it's a regular file not opened for writing). > > > > Yeah, that's a dodgy hack which needs to go away. But we need a lot > > more filesystems converted to supporting large folios before we can > > delete it. Not your responsibility; I'm doing my best to encourage > > fs maintainers to do this part. > > Got it. In the meantime, do we want to check the old conditions + > mapping_large_folio_support()? Yes, that should work. khugepaged should be free to create large folios if the underlying filesystem supports them OR (executable, read-only, CONFIG_THP_RO, etc, etc). > > > Also, just to check, there isn't anything wrong with following > > > collapse_file()'s approach, even for folios of 0 < order < > > > HPAGE_PMD_ORDER? I.e this part: > > > > > > * Basic scheme is simple, details are more complex: > > > * - allocate and lock a new huge page; > > > * - scan page cache replacing old pages with the new one > > > * + swap/gup in pages if necessary; > > > * + fill in gaps; > > > * + keep old pages around in case rollback is required; > > > * - if replacing succeeds: > > > * + copy data over; > > > * + free old pages; > > > * + unlock huge page; > > > * - if replacing failed; > > > * + put all pages back and unfreeze them; > > > * + restore gaps in the page cache; > > > * + unlock and free huge page; > > > */ > > > > Correct. At least, as far as I know! Working on folios has been quite > > the education for me ... > > Great! Well, perhaps I'll run into a snafu here or there (and > hopefully learn something myself) but this gives me enough confidence > to naively give it a try and see what happens! > > Again, thank you very much for your time, help and advice with this, You're welcome! Thanks for putting in some work on this project!