Re: folio mapcount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 15, 2021 at 09:55:20PM +0000, Matthew Wilcox wrote:
> I've been trying to understand whether we can simplify the mapcount
> handling for folios from the current situation with THPs.  Let me
> quote the commit message from 53f9263baba6:
> 
> > mm: rework mapcount accounting to enable 4k mapping of THPs
> >
> > We're going to allow mapping of individual 4k pages of THP compound.  It
> > means we need to track mapcount on per small page basis.
> >
> > Straight-forward approach is to use ->_mapcount in all subpages to track
> > how many time this subpage is mapped with PMDs or PTEs combined.  But
> > this is rather expensive: mapping or unmapping of a THP page with PMD
> > would require HPAGE_PMD_NR atomic operations instead of single we have
> > now.
> >
> > The idea is to store separately how many times the page was mapped as
> > whole -- compound_mapcount.  This frees up ->_mapcount in subpages to
> > track PTE mapcount.
> >
> > We use the same approach as with compound page destructor and compound
> > order to store compound_mapcount: use space in first tail page,
> > ->mapping this time.
> >
> > Any time we map/unmap whole compound page (THP or hugetlb) -- we
> > increment/decrement compound_mapcount.  When we map part of compound
> > page with PTE we operate on ->_mapcount of the subpage.
> >
> > page_mapcount() counts both: PTE and PMD mappings of the page.
> >
> > Basically, we have mapcount for a subpage spread over two counters.  It
> > makes tricky to detect when last mapcount for a page goes away.
> >
> > We introduced PageDoubleMap() for this.  When we split THP PMD for the
> > first time and there's other PMD mapping left we offset up ->_mapcount
> > in all subpages by one and set PG_double_map on the compound page.
> > These additional references go away with last compound_mapcount.
> >
> > This approach provides a way to detect when last mapcount goes away on
> > per small page basis without introducing new overhead for most common
> > cases.
> 
> What breaks if we simply track any mapping (whether by PMD or PTE)
> as an increment to the head page (aka folio's) refcount?

The obvious answer is CoW: as discussed yesterday we need exact mapcount
to know if the page can be re-used or has to be copied.

Consider the case when you have folio mapped as PMD and then split into
PTE page table (like with mprotect()). You get WP page fault on a page
that has mapcount == 512. How would you know if we can re-use the 4k?

Also we need to detect case when the last mapping of a 4k in the folio has
gone to trigger deferred_split_huge_page() logic.

> Essentially, we make the head mapcount 'the number of VMAs which contain
> a reference to any page in this folio'.

Okay, so you will have mapcount == 2 or 3 for mprotect case above, not
512. But it doesn't help with answering question if the page can be
re-used. You would need to do rmap walk to get the answer.

Note also that VMA lifecycle is different from page lifecycle:
MADV_DONTNEED removes mapping, but leaves VMA intact. Who would decrement
mapcount here?

> We can remove PageDoubleMap. The tail refcounts will all be 0.  If it's
> useful, we could introduce a 'partial_mapcount' which would be <=
> mapcount (but I don't know if it's useful).  Splitting a PMD would not
> change ->_mapcount.  Splitting the folio already causes the folio to be
> unmapped, so page faults will naturally re-increment ->_mapcount of each
> subpage.
> 
> We might need some additional logic to treat a large folio (aka compound
> page) as a single unit; that is, when we fault on one page, we place
> entries for all pages in this folio (that fit ...) into the page tables,
> so that we only account it once, even if it's not compatible with using
> a PMD.

I still don't see a way to simplify mapcount for THP. But I'm preconsived
becasue I'm the author of the current scheme.

Please, prove me wrong. I want to be mistaken. :)

-- 
 Kirill A. Shutemov




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux