On 1/27/22 13:57, Matthew Wilcox wrote:
As promised, here's a half-baked proposal for making folio_mapcount() significantly cheaper at the cost of making it less precise. I appreciate that folio_mapcount() is not upstream yet, so take a look at total_mapcount() if you want to understand what I'm talking about. For a 2MB folio on a 4k architecture, you have to check 512 cachelines to determine how many times a folio is mapped. That's 32kB of memory, which is a good chunk of your L1 cache. The problem is that every PTE mapping increments the ->mapcount of each individual page (and the number of PMD mappings is stored separately). To find out how many times the entire folio is mapped, you've got to look at each constituent page. Added to that, each increment of any of the ->mapcount bumps the refcount on the head page. That's a lot of atomic ops, and we've had some problems where the page refcount has been attacked resulting in overflow. I would like to start counting folio mapcounts in a more Discworld Troll manner. Zero, One, Two, Many. That limits the total number of refcount increments to 3. Once you reach "Many", you've essentially lost count, and you need to walk the interval tree to figure out exactly how many mappings there are (this means we can no longer use mapcount to decide to stop walking the rmap, but I think that's OK?) You can decrement from Two to One and One to Zero, but you can't decrement from Many to Two. If you walk the rmap and discover there are less than Many mappings, you can set mapcount to Two, One or Zero (adjusting page refcount at the same time). The mapcount would also no longer count the number of individual PTE or PMD mappings. Instead, it would be the number of VMAs which contain at least one page table reference to this folio. One advantage to this scheme is that it makes something like 30 bits available in struct page. I'm sure we'll be able to think of some good uses for them. PageDoubleMap also goes away (because we no longer care
Such as upgrading from: page_maybe_dma_pinned(), to: oh_yes_page_is_most_definitely_dma_pinned() ! :) ...I just can't let that idea go. haha. thanks, -- John Hubbard NVIDIA
whether the folio is mapped with PMDs or PTEs). So ... what's going to be made catastrophically slower by this scheme? Maybe something involving anonymous pages? Those tend to be my blind spot.