ZONE_DEVICE refcounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I
have been looking  at fixing the 1-based refcounts that are currently used for
FS DAX pages (and p2pdma pages, but that's trival).

This started with the simple idea of "just subtract one from the
refcounts everywhere and that will fix the off by one". Unfortunately
it's not that simple. For starters doing a simple conversion like that
requires allowing pages to be mapped with zero refcounts. That seems
wrong. It also leads to problems detecting idle IO vs. page map pages.

So instead I'm thinking of doing something along the lines of the following:

1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and
   increment the refcount inline with mapcount and decrement it when pages are
   unmapped.

2. As per normal pages the pages are considered free when the refcount drops
   to zero.

3. Because these are treated as normal pages for refcounting we no longer map
   them as pte_devmap() (possibly freeing up a PTE bit).

4. PMD sized FS DAX pages get treated the same as normal compound pages.

5. This means we need to allow compound ZONE DEVICE pages. Tail pages share
   the page->pgmap field with page->compound_head, but this isn't a problem
   because the LSB of page->pgmap is free and we can still get pgmap from
   compound_head(page)->pgmap.

6. When FS DAX pages are freed they notify filesystem drivers. This can be done
   from the pgmap->ops->page_free() callback.

7. We could probably get rid of the pgmap refcounting because we can just scan
   pages and look for any pages with non-zero references and wait for them to be
   freed whilst ensuring no new mappings can be created (some drivers do a
   similar thing for private pages today). This might be a follow-up change.

I have made good progress implementing the above, and am reasonably confident I
can make it work (I have some tests that exercise these code paths working).

However my knowledge of the filesystem layer is a bit thin, so before going too
much further down this path I was hoping to get some feedback on the overall
direction to see if there are any corner cases or other potential problems I
have missed that may prevent the above being practical.

If not I will clean my series up and post it as an RFC. Thanks.

 - Alistair




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux