Re: ZONE_DEVICE refcounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 08, 2024 at 03:24:35PM +1100, Alistair Popple wrote:
> Hi,
> 
> I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I
> have been looking  at fixing the 1-based refcounts that are currently used for
> FS DAX pages (and p2pdma pages, but that's trival).
> 
> This started with the simple idea of "just subtract one from the
> refcounts everywhere and that will fix the off by one". Unfortunately
> it's not that simple. For starters doing a simple conversion like that
> requires allowing pages to be mapped with zero refcounts. That seems
> wrong. It also leads to problems detecting idle IO vs. page map pages.
> 
> So instead I'm thinking of doing something along the lines of the following:
> 
> 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and
>    increment the refcount inline with mapcount and decrement it when pages are
>    unmapped.

This is the right thing to do

> 2. As per normal pages the pages are considered free when the refcount drops
>    to zero.
> 
> 3. Because these are treated as normal pages for refcounting we no longer map
>    them as pte_devmap() (possibly freeing up a PTE bit).

Yes, the pmd/pte_devmap() should ideally go away.

> 4. PMD sized FS DAX pages get treated the same as normal compound pages.
> 
> 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share
>    the page->pgmap field with page->compound_head, but this isn't a problem
>    because the LSB of page->pgmap is free and we can still get pgmap from
>    compound_head(page)->pgmap.

Right, this is the actual work - the mm is obviously already happy
with its part, fsdax just need to create a properly sized folio and
map it properly.

> 6. When FS DAX pages are freed they notify filesystem drivers. This can be done
>    from the pgmap->ops->page_free() callback.
> 
> 7. We could probably get rid of the pgmap refcounting because we can just scan
>    pages and look for any pages with non-zero references and wait for them to be
>    freed whilst ensuring no new mappings can be created (some drivers do a
>    similar thing for private pages today). This might be a follow-up
>    change.

Yeah, the pgmap refcounting needs some cleanup for sure.

Jason




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux