On Mon Feb 24, 2025 at 11:55 AM EST, David Hildenbrand wrote: > Let's implement an alternative when per-page mapcounts in large folios > are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. > > For large folios, we'll return the per-page average mapcount within the > folio, except when the average is 0 but the folio is mapped: then we > return 1. > > For hugetlb folios and for large folios that are fully mapped > into all address spaces, there is no change. > > As an alternative, we could simply return 0 for non-hugetlb large folios, > or disable this legacy interface with CONFIG_NO_PAGE_MAPCOUNT. > > But the information exposed by this interface can still be valuable, and > frequently we deal with fully-mapped large folios where the average > corresponds to the actual page mapcount. So we'll leave it like this for > now and document the new behavior. > > Note: this interface is likely not very relevant for performance. If > ever required, we could try doing a rather expensive rmap walk to collect > precisely how often this folio page is mapped. > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > --- > Documentation/admin-guide/mm/pagemap.rst | 7 +++++- > fs/proc/internal.h | 31 ++++++++++++++++++++++++ > fs/proc/page.c | 19 ++++++++++++--- > 3 files changed, 53 insertions(+), 4 deletions(-) > > diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst > index caba0f52dd36c..49590306c61a0 100644 > --- a/Documentation/admin-guide/mm/pagemap.rst > +++ b/Documentation/admin-guide/mm/pagemap.rst > @@ -42,7 +42,12 @@ There are four components to pagemap: > skip over unmapped regions. > > * ``/proc/kpagecount``. This file contains a 64-bit count of the number of > - times each page is mapped, indexed by PFN. > + times each page is mapped, indexed by PFN. Some kernel configurations do > + not track the precise number of times a page part of a larger allocation > + (e.g., THP) is mapped. In these configurations, the average number of > + mappings per page in this larger allocation is returned instead. However, > + if any page of the large allocation is mapped, the returned value will > + be at least 1. > > The page-types tool in the tools/mm directory can be used to query the > number of times a page is mapped. > diff --git a/fs/proc/internal.h b/fs/proc/internal.h > index 1695509370b88..16aa1fd260771 100644 > --- a/fs/proc/internal.h > +++ b/fs/proc/internal.h > @@ -174,6 +174,37 @@ static inline int folio_precise_page_mapcount(struct folio *folio, > return mapcount; > } > > +/** > + * folio_average_page_mapcount() - Average number of mappings per page in this > + * folio > + * @folio: The folio. > + * > + * The average number of present user page table entries that reference each > + * page in this folio as tracked via the RMAP: either referenced directly > + * (PTE) or as part of a larger area that covers this page (e.g., PMD). > + * > + * Returns: The average number of mappings per page in this folio. 0 for > + * folios that are not mapped to user space or are not tracked via the RMAP > + * (e.g., shared zeropage). > + */ > +static inline int folio_average_page_mapcount(struct folio *folio) > +{ > + int mapcount, entire_mapcount; > + unsigned int adjust; > + > + if (!folio_test_large(folio)) > + return atomic_read(&folio->_mapcount) + 1; > + > + mapcount = folio_large_mapcount(folio); > + entire_mapcount = folio_entire_mapcount(folio); > + if (mapcount <= entire_mapcount) > + return entire_mapcount; > + mapcount -= entire_mapcount; > + > + adjust = folio_large_nr_pages(folio) / 2; Is there any reason for choosing this adjust number? A comment might be helpful in case people want to change it later, either with some reasoning or just saying it is chosen empirically. > + return ((mapcount + adjust) >> folio_large_order(folio)) + > + entire_mapcount; > +} > /* > * array.c > */ > diff --git a/fs/proc/page.c b/fs/proc/page.c > index a55f5acefa974..4d3290cc69667 100644 > --- a/fs/proc/page.c > +++ b/fs/proc/page.c > @@ -67,9 +67,22 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf, > * memmaps that were actually initialized. > */ > page = pfn_to_online_page(pfn); > - if (page) > - mapcount = folio_precise_page_mapcount(page_folio(page), > - page); > + if (page) { > + struct folio *folio = page_folio(page); > + > + if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) { > + mapcount = folio_precise_page_mapcount(folio, page); > + } else { > + /* > + * Indicate the per-page average, but at least "1" for > + * mapped folios. > + */ > + mapcount = folio_average_page_mapcount(folio); > + if (!mapcount && folio_test_large(folio) && > + folio_mapped(folio)) > + mapcount = 1; This should be part of folio_average_page_mapcount() right? Otherwise, the comment on folio_average_page_mapcount() is not correct, since it can return 0 when a folio is mapped to user space. > + } > + } > > if (put_user(mapcount, out)) { > ret = -EFAULT; -- Best Regards, Yan, Zi