On 8 Feb 2023, at 14:54, Matthew Wilcox wrote: > On Wed, Feb 08, 2023 at 02:36:41PM -0500, Zi Yan wrote: >> On 7 Feb 2023, at 11:51, Matthew Wilcox wrote: >> >>> On Tue, Feb 07, 2023 at 11:23:31AM -0500, Zi Yan wrote: >>>> On 24 Jan 2023, at 13:13, Matthew Wilcox wrote: >>>> >>>>> Once we get to the part of the folio journey where we have >>>>> one-pointer-per-page, we can't afford to maintain per-page state. >>>>> Currently we maintain a per-page mapcount, and that will have to go. >>>>> We can maintain extra state for a multi-page folio, but it has to be a >>>>> constant amount of extra state no matter how many pages are in the folio. >>>>> >>>>> My proposal is that we maintain a single mapcount per folio, and its >>>>> definition is the number of (vma, page table) tuples which have a >>>>> reference to any pages in this folio. >>>> >>>> How about having two, full_folio_mapcount and partial_folio_mapcount? >>>> If partial_folio_mapcount is 0, we can have a fast path without doing >>>> anything at page level. >>> >>> A fast path for what? I don't understand your vision; can you spell it >>> out for me? My current proposal is here: >> >> A fast code path for only handling folios as a whole. For cases that >> subpages are mapped from a folio, traversing through subpages might be >> needed and will be slow. A code separation might be cleaner and makes >> folio as a whole handling quicker. > > To be clear, in this proposal, there is no subpage mapcount. I've got > my eye on one struct folio per allocation, so there will be no more > tail pages. The proposal has one mapcount, and that's it. I'd be > open to saying "OK, we need two mapcounts", but not to anything that > needs to scale per number of pages in the folio. > >> For your proposal, "How many VMAs have one-or-more pages of this folio mapped" >> should be the responsibility of rmap. We could add a counter to rmap >> instead. It seems that you are mixing page table mapping with virtual >> address space (VMA) mapping together. > > rmap tells you how many VMAs cover this folio. It doesn't tell you > how many of those VMAs have actually got any pages from it mapped. > It's also rather slower than a simple atomic_read(), so I think > you'll have an uphill battle trying to convince people to use rmap > for this purpose. > > I'm not sure what you mean by "add a counter to rmap"? One count > per mapped page in the vma? > >>> >>> https://lore.kernel.org/linux-mm/Y+FkV4fBxHlp6FTH@xxxxxxxxxxxxxxxxxxxx/ >>> >>> The three questions we need to be able to answer (in my current >>> understanding) are laid out here: >>> >>> https://lore.kernel.org/linux-mm/Y+HblAN5bM1uYD2f@xxxxxxxxxxxxxxxxxxxx/ >> >> I think we probably need to clarify the definition of "map" in your >> questions. Does it mean mapped by page tables or VMAs? When a page >> is mapped into a VMA, it can be mapped by one or more page table entries, >> but not the other way around, right? Or is shared page table entry merged >> now so that more than one VMAs can use a single page table entry to map >> a folio? > > Mapped by page tables, just like today. It'd be quite the change to > figure out the mapcount of a page newly brought into the page cache; > we'd have to do an rmap walk to see how many mapcounts to give it. > I don't think this is a great idea. > > As far as I know, shared page tables are only supported by hugetlbfs, > and I prefer to stick cheese in my ears and pretend they don't exist. > > To be absolutely concrete about this, my proposal is: > > Folio brought into page cache has mapcount 0 (whether or not there are any VMAs > that cover it) > When we take a page fault on one of the pages in it, its mapcount > increases from 0 to 1. > When we take another page fault on a page in it, we do a pvmw to > determine if any pages from this folio are already mapped by this VMA; > we see that there is one and we do not increment the mapcount. > We partially munmap() so that we need to unmap one of the pages. > We remove it from the page tables and call page_remove_rmap(). > That does another pvmw and sees there's still a page in this folio > mapped by this VMA, does not decrement the refcount > We truncate() the file smaller than the position of the folio, which > causes us to unmap the rest of the folio. The pvmw walk detects no > more pages from this folio mapped and we decrement the mapcount. > > Clear enough? Yes. Thanks. -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature