Re: Folio mapcount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 06, 2023 at 08:34:31PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 24, 2023 at 06:13:21PM +0000, Matthew Wilcox wrote:
> > Once we get to the part of the folio journey where we have 
> > one-pointer-per-page, we can't afford to maintain per-page state.
> > Currently we maintain a per-page mapcount, and that will have to go. 
> > We can maintain extra state for a multi-page folio, but it has to be a
> > constant amount of extra state no matter how many pages are in the folio.
> > 
> > My proposal is that we maintain a single mapcount per folio, and its
> > definition is the number of (vma, page table) tuples which have a
> > reference to any pages in this folio.
> 
> I've been thinking about this a lot more, and I have changed my
> mind.  It works fine to answer the question "Is any page in this
> folio mapped", but it's now hard to answer the question "I have it
> mapped, does anybody else?"  That question is asked, for example,
> in madvise_cold_or_pageout_pte_range().

I'm curious whether it is still fine in rare cases - IMHO it's a matter of
when it'll go severely wrong if the mapcount should be exactly 1 (it's
privately owned by a vma) but we reported 2.

In this MADV_COLD/MADV_PAGEOUT case we'll skip COLD or PAGEOUT some pages
even if we can, but is it a deal breaker (if the benefit of the change can
be proved and worthwhile)?  Especially, this only happens with unaligned
folios being mapped.

Is unaligned mapping for a folio common? Is there any other use cases that
can go worse than this one?

(E.g., IIUC superfluous but occasional CoW seems fine)

OTOH...

> 
> With this definition, if the mapcount is 1, it's definitely only mapped
> by us.  If it's more than 2, it's definitely mapped by somebody else (*).
> If it's 2, maybe we have the folio mapped twice, and maybe we have it
> mapped once and somebody else has it mapped once, so we have to consult
> the rmap to find out.  Not fun times.
> 
> (*) If we support folios larger than PMD size, then the answer is more
> complex.
> 
> I now think the mapcount has to be defined as "How many VMAs have
> one-or-more pages of this folio mapped".
> 
> That means that our future folio_add_file_rmap_range() looks a bit
> like this:
> 
> {
> 	bool add_mapcount = true;
> 
> 	if (nr < folio_nr_pages(folio))
> 		add_mapcount = !folio_has_ptes(folio, vma);
> 	if (add_mapcount)
> 		atomic_inc(&folio->_mapcount);
> 
> 	__lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr);
> 	if (nr == HPAGE_PMD_NR)
> 		__lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
> 			NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr);
> 
> 	mlock_vma_folio(folio, vma, nr == HPAGE_PMD_NR);
> }
> 
> bool folio_mapped_in_vma(struct folio *folio, struct vm_area_struct *vma)
> {
> 	unsigned long address = vma_address(&folio->page, vma);
> 	DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
> 
> 	if (!page_vma_mapped_walk(&pvmw))
> 		return false;
> 	page_vma_mapped_walk_done(&pvmw);
> 	return true;
> }
> 
> ... some details to be fixed here; particularly this will currently
> deadlock on the PTL, so we'd need not only to exclude the current
> PMD from being examined, but also avoid a deadly embrace between
> two threads (do we currently have a locking order defined for
> page table locks at the same height of the tree?)

... it starts to sound scary if it needs to take >1 pgtable locks.

Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux