Re: Folio mapcount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02.07.23 21:51, Zi Yan wrote:
On 2 Jul 2023, at 7:45, David Hildenbrand wrote:

On 02.07.23 11:50, Yin, Fengwei wrote:


On 7/1/2023 9:17 AM, Zi Yan wrote:
In kernel, almost all code only cares: 1) if a page/folio has extra pins
by checking if mapcount is equal to refcount + extra, and 2)
if a page/folio is mapped multiple times. A single mapcount can meet
these two needs.
For 2, how can we know whether a page/folio is mapped multiple times from
single mapcount? My understanding is we need two counts as folio could be
partial mapped.

Yes, a single mapcount is most probably insufficient. I started analyzing all existing users and use cases, trying to avoid walking page tables.

 From my understanding, a single mapcount is sufficient for kernel users, which
calls page_mapcount(). Because they either check mapcount against refcount to
see if a page has extra pin or check mapcount to see if a page is mapped more
than once.


There are cases where we want to know "do we have PTE mappings", but I yet have to write it all up.


If we want to get rid of all of (most) sub-page mapcounts, we'd probably want:

(1) Total mapcount (compound + any sub-page): page_mapped(), pagecount
     vs. refcount games, ...

a single mapcount is sufficient in this case.

Well, that's what I describe here: 1) covers exactly these cases.



(2) Compound mapcount (for PMD/PUD-mappale THP only): (2) - (1) tells
     you if it's only PMD mapped or also PTE-mapped. For example, for
     statistics but also swapout code.

For statistics, it is for NR_{ANON,FILE}_MAPPED and NR_ANON_THP. I wonder
if we can use the number of anonymous/file pages and THPs instead, without
caring about if it is mapped or not.

For swapout, folio_entire_mapcount() is used to estimate if a THP is fully
mapped or not. I wonder if we can get away with another estimation like
total_mapcount() > folio_nr_pages().

What do we gain by that? Again, I don't see a reason to degrade current state just by trying to achieve 1 mapcount when it really barely matter if we have 2 or 3 instead. Right now we have 513 and with any larger folios significantly more ... than 2 or 3.



(3) Mapcount of first (or any other) subpage (compount+subpage): for
     folio_estimated_sharers().

This is another estimation. I wonder if we can use a different estimation
like total_mapcount() > folio_nr_pages() instead.

At least not for PMD-mapped THP. Maybe we could do with (2). But I recall some cases where it got ugly, will try to remember them.



For anon pages, I'm thinking about remembering an additional

(1) Page/folio creator (MM pointer/identification)
(2) Page/folio creator mapcount

When optimizing a PTE-mapped THP (especially not- pmd-mappale) for the fork()+exec() case, we'd have to walk page tables to see if all folio references come from this MM. The page/folio creator exactly avoids that completely. We might need a mechanism to synchronize against mapping/unmapping of this folio from the creator concurrently (relevant when mapped into multiple page tables).

creator_mapcount < total_mapcount means multiple MMs map this folio? And this is for
page exclusive check? Sorry I have not checked the code in detail yet. The sync

Right now we essentially do if !PageAnonExlusive:

if (page_count() != 1)
	copy
reuse

to see if we really hold the only reference to that folio.


If we could stabilize the creators mapcount, it would be something like

if (f->creator != mm || page_count(f) != f->creators_mapcount)
	copy
reuse


So we wouldn't have to scan page tables to identify if we're resonsible for all of the page references via our page tables.


But that's so far only an idea I had when thinking about how to avoid page table scans for the simple fork+exec() case, not matured yet.

of creator_mapcount with total_mapcount might have some extra cost. I wonder if
this can be solved by checked num_active_vmas in anon_vma of a folio.

As we nowadays match the actual references (i.e., page_count() != 1), that's most probably insufficient and what I recall, easily less precise.


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux