A two-bit folio_mapcount

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 27 Jan 2022 21:57:05 +0000

As promised, here's a half-baked proposal for making folio_mapcount()
significantly cheaper at the cost of making it less precise.
I appreciate that folio_mapcount() is not upstream yet, so take a look
at total_mapcount() if you want to understand what I'm talking about.

For a 2MB folio on a 4k architecture, you have to check 512 cachelines
to determine how many times a folio is mapped.  That's 32kB of memory,
which is a good chunk of your L1 cache.  The problem is that every PTE
mapping increments the ->mapcount of each individual page (and the number
of PMD mappings is stored separately).  To find out how many times the
entire folio is mapped, you've got to look at each constituent page.

Added to that, each increment of any of the ->mapcount bumps the
refcount on the head page.  That's a lot of atomic ops, and we've had
some problems where the page refcount has been attacked resulting in
overflow.

I would like to start counting folio mapcounts in a more Discworld Troll
manner.  Zero, One, Two, Many.  That limits the total number of refcount
increments to 3.  Once you reach "Many", you've essentially lost count,
and you need to walk the interval tree to figure out exactly how many
mappings there are (this means we can no longer use mapcount to decide to
stop walking the rmap, but I think that's OK?)  You can decrement from
Two to One and One to Zero, but you can't decrement from Many to Two.
If you walk the rmap and discover there are less than Many mappings,
you can set mapcount to Two, One or Zero (adjusting page refcount at
the same time).

The mapcount would also no longer count the number of individual PTE or
PMD mappings.  Instead, it would be the number of VMAs which contain at
least one page table reference to this folio.

One advantage to this scheme is that it makes something like 30 bits
available in struct page.  I'm sure we'll be able to think of some good
uses for them.  PageDoubleMap also goes away (because we no longer care
whether the folio is mapped with PMDs or PTEs).

So ... what's going to be made catastrophically slower by this scheme?
Maybe something involving anonymous pages?  Those tend to be my blind
spot.