On Mon, Feb 06, 2023 at 08:34:31PM +0000, Matthew Wilcox wrote: > On Tue, Jan 24, 2023 at 06:13:21PM +0000, Matthew Wilcox wrote: > > Once we get to the part of the folio journey where we have > > one-pointer-per-page, we can't afford to maintain per-page state. > > Currently we maintain a per-page mapcount, and that will have to go. > > We can maintain extra state for a multi-page folio, but it has to be a > > constant amount of extra state no matter how many pages are in the folio. > > > > My proposal is that we maintain a single mapcount per folio, and its > > definition is the number of (vma, page table) tuples which have a > > reference to any pages in this folio. > > I've been thinking about this a lot more, and I have changed my > mind. It works fine to answer the question "Is any page in this > folio mapped", but it's now hard to answer the question "I have it > mapped, does anybody else?" That question is asked, for example, > in madvise_cold_or_pageout_pte_range(). I'm curious whether it is still fine in rare cases - IMHO it's a matter of when it'll go severely wrong if the mapcount should be exactly 1 (it's privately owned by a vma) but we reported 2. In this MADV_COLD/MADV_PAGEOUT case we'll skip COLD or PAGEOUT some pages even if we can, but is it a deal breaker (if the benefit of the change can be proved and worthwhile)? Especially, this only happens with unaligned folios being mapped. Is unaligned mapping for a folio common? Is there any other use cases that can go worse than this one? (E.g., IIUC superfluous but occasional CoW seems fine) OTOH... > > With this definition, if the mapcount is 1, it's definitely only mapped > by us. If it's more than 2, it's definitely mapped by somebody else (*). > If it's 2, maybe we have the folio mapped twice, and maybe we have it > mapped once and somebody else has it mapped once, so we have to consult > the rmap to find out. Not fun times. > > (*) If we support folios larger than PMD size, then the answer is more > complex. > > I now think the mapcount has to be defined as "How many VMAs have > one-or-more pages of this folio mapped". > > That means that our future folio_add_file_rmap_range() looks a bit > like this: > > { > bool add_mapcount = true; > > if (nr < folio_nr_pages(folio)) > add_mapcount = !folio_has_ptes(folio, vma); > if (add_mapcount) > atomic_inc(&folio->_mapcount); > > __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr); > if (nr == HPAGE_PMD_NR) > __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? > NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr); > > mlock_vma_folio(folio, vma, nr == HPAGE_PMD_NR); > } > > bool folio_mapped_in_vma(struct folio *folio, struct vm_area_struct *vma) > { > unsigned long address = vma_address(&folio->page, vma); > DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); > > if (!page_vma_mapped_walk(&pvmw)) > return false; > page_vma_mapped_walk_done(&pvmw); > return true; > } > > ... some details to be fixed here; particularly this will currently > deadlock on the PTL, so we'd need not only to exclude the current > PMD from being examined, but also avoid a deadly embrace between > two threads (do we currently have a locking order defined for > page table locks at the same height of the tree?) ... it starts to sound scary if it needs to take >1 pgtable locks. Thanks, -- Peter Xu