On Fri, Mar 08, 2024 at 03:24:35PM +1100, Alistair Popple wrote: > Hi, > > I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I > have been looking at fixing the 1-based refcounts that are currently used for > FS DAX pages (and p2pdma pages, but that's trival). > > This started with the simple idea of "just subtract one from the > refcounts everywhere and that will fix the off by one". Unfortunately > it's not that simple. For starters doing a simple conversion like that > requires allowing pages to be mapped with zero refcounts. That seems > wrong. It also leads to problems detecting idle IO vs. page map pages. > > So instead I'm thinking of doing something along the lines of the following: > > 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and > increment the refcount inline with mapcount and decrement it when pages are > unmapped. This is the right thing to do > 2. As per normal pages the pages are considered free when the refcount drops > to zero. > > 3. Because these are treated as normal pages for refcounting we no longer map > them as pte_devmap() (possibly freeing up a PTE bit). Yes, the pmd/pte_devmap() should ideally go away. > 4. PMD sized FS DAX pages get treated the same as normal compound pages. > > 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share > the page->pgmap field with page->compound_head, but this isn't a problem > because the LSB of page->pgmap is free and we can still get pgmap from > compound_head(page)->pgmap. Right, this is the actual work - the mm is obviously already happy with its part, fsdax just need to create a properly sized folio and map it properly. > 6. When FS DAX pages are freed they notify filesystem drivers. This can be done > from the pgmap->ops->page_free() callback. > > 7. We could probably get rid of the pgmap refcounting because we can just scan > pages and look for any pages with non-zero references and wait for them to be > freed whilst ensuring no new mappings can be created (some drivers do a > similar thing for private pages today). This might be a follow-up > change. Yeah, the pgmap refcounting needs some cleanup for sure. Jason