On Mon, Jan 29, 2024 at 01:05:04PM +0100, David Hildenbrand wrote: > As PTE-mapped large folios become more relevant (mTHP [1]) and there is the > desire to shrink the metadata allocated for such large folios as well > (memdesc [2]), how we track folio mappings gets more relevant. Over the > years, we used folio mapping information to answer various questions: is > this folio mapped by somebody else? do we have to COW on write fault? how do > we adjust memory statistics? ... > > Let's talk about ongoing work in the mapcount area, get a common > understanding of what the users of the different mapcounts are and what the > implications of removing some would be: which questions could we answer > differently, which questions would we not be able to answer precisely > anymore, and what would be the implications of such changes? > > For example, can we tolerate some imprecise memory statistics? How > expressive is the PSS when large folios are only partially mapped? Would we > need a transition period and glue changes to a new CONFIG_ option? Do we > really have to support THP and friends on 32bit? Excellent topics to cover. I have some of my own questions ... Are we in danger of overflowing page refcount too easily? Pincount isn't an issue here; we're talking about large folios, so pincount gets its own field. But with tracking one mapcount per PTE mapping of a folio, we can easily increment a PMD-sized folio's refcount by 512 per VMA. Now we only need 2^22 VMAs to hit the 2^31 limit before the page->refcount protections go into effect and operations start failing. How / do we need to track mapcount for pages mapped to userspace which are neither file-backed, nor anonymous mappings? eg drivers pass vmalloc memory to vmf_insert_page() in their ->mmap handler. What do VM_PFNMAP and VM_MIXEDMAP really imply? The documentation here is a little sparse. And that's sad, because I think we expect device driver writers to use them, and without clear documentation of what they actually do, they're going to be misused.