On 24 Feb 2025, at 16:42, David Hildenbrand wrote: > On 24.02.25 22:23, Zi Yan wrote: >> On 24 Feb 2025, at 16:15, David Hildenbrand wrote: >> >>> On 24.02.25 22:10, Zi Yan wrote: >>>> On 24 Feb 2025, at 16:02, David Hildenbrand wrote: >>>> >>>>> On 24.02.25 21:40, Zi Yan wrote: >>>>>> On Mon Feb 24, 2025 at 11:55 AM EST, David Hildenbrand wrote: >>>>>>> Let's implement an alternative when per-page mapcounts in large folios >>>>>>> are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. >>>>>>> >>>>>>> For large folios, we'll return the per-page average mapcount within the >>>>>>> folio, except when the average is 0 but the folio is mapped: then we >>>>>>> return 1. >>>>>>> >>>>>>> For hugetlb folios and for large folios that are fully mapped >>>>>>> into all address spaces, there is no change. >>>>>>> >>>>>>> As an alternative, we could simply return 0 for non-hugetlb large folios, >>>>>>> or disable this legacy interface with CONFIG_NO_PAGE_MAPCOUNT. >>>>>>> >>>>>>> But the information exposed by this interface can still be valuable, and >>>>>>> frequently we deal with fully-mapped large folios where the average >>>>>>> corresponds to the actual page mapcount. So we'll leave it like this for >>>>>>> now and document the new behavior. >>>>>>> >>>>>>> Note: this interface is likely not very relevant for performance. If >>>>>>> ever required, we could try doing a rather expensive rmap walk to collect >>>>>>> precisely how often this folio page is mapped. >>>>>>> >>>>>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >>>>>>> --- >>>>>>> Documentation/admin-guide/mm/pagemap.rst | 7 +++++- >>>>>>> fs/proc/internal.h | 31 ++++++++++++++++++++++++ >>>>>>> fs/proc/page.c | 19 ++++++++++++--- >>>>>>> 3 files changed, 53 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst >>>>>>> index caba0f52dd36c..49590306c61a0 100644 >>>>>>> --- a/Documentation/admin-guide/mm/pagemap.rst >>>>>>> +++ b/Documentation/admin-guide/mm/pagemap.rst >>>>>>> @@ -42,7 +42,12 @@ There are four components to pagemap: >>>>>>> skip over unmapped regions. >>>>>>> * ``/proc/kpagecount``. This file contains a 64-bit count of the number of >>>>>>> - times each page is mapped, indexed by PFN. >>>>>>> + times each page is mapped, indexed by PFN. Some kernel configurations do >>>>>>> + not track the precise number of times a page part of a larger allocation >>>>>>> + (e.g., THP) is mapped. In these configurations, the average number of >>>>>>> + mappings per page in this larger allocation is returned instead. However, >>>>>>> + if any page of the large allocation is mapped, the returned value will >>>>>>> + be at least 1. >>>>>>> The page-types tool in the tools/mm directory can be used to query the >>>>>>> number of times a page is mapped. >>>>>>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h >>>>>>> index 1695509370b88..16aa1fd260771 100644 >>>>>>> --- a/fs/proc/internal.h >>>>>>> +++ b/fs/proc/internal.h >>>>>>> @@ -174,6 +174,37 @@ static inline int folio_precise_page_mapcount(struct folio *folio, >>>>>>> return mapcount; >>>>>>> } >>>>>>> +/** >>>>>>> + * folio_average_page_mapcount() - Average number of mappings per page in this >>>>>>> + * folio >>>>>>> + * @folio: The folio. >>>>>>> + * >>>>>>> + * The average number of present user page table entries that reference each >>>>>>> + * page in this folio as tracked via the RMAP: either referenced directly >>>>>>> + * (PTE) or as part of a larger area that covers this page (e.g., PMD). >>>>>>> + * >>>>>>> + * Returns: The average number of mappings per page in this folio. 0 for >>>>>>> + * folios that are not mapped to user space or are not tracked via the RMAP >>>>>>> + * (e.g., shared zeropage). >>>>>>> + */ >>>>>>> +static inline int folio_average_page_mapcount(struct folio *folio) >>>>>>> +{ >>>>>>> + int mapcount, entire_mapcount; >>>>>>> + unsigned int adjust; >>>>>>> + >>>>>>> + if (!folio_test_large(folio)) >>>>>>> + return atomic_read(&folio->_mapcount) + 1; >>>>>>> + >>>>>>> + mapcount = folio_large_mapcount(folio); >>>>>>> + entire_mapcount = folio_entire_mapcount(folio); >>>>>>> + if (mapcount <= entire_mapcount) >>>>>>> + return entire_mapcount; >>>>>>> + mapcount -= entire_mapcount; >>>>>>> + >>>>>>> + adjust = folio_large_nr_pages(folio) / 2; >>>>> >>>>> Thanks for the review! >>>>> >>>>>> >>>>>> Is there any reason for choosing this adjust number? A comment might be >>>>>> helpful in case people want to change it later, either with some reasoning >>>>>> or just saying it is chosen empirically. >>>>> >>>>> We're dividing by folio_large_nr_pages(folio) (shifting by folio_large_order(folio)), so this is not a magic number at all. >>>>> >>>>> So this should be "ordinary" rounding. >>>> >>>> I thought the rounding would be (mapcount + 511) / 512. >>> >>> Yes, that's "rounding up". >>> >>>> But >>>> that means if one subpage is mapped, the average will be 1. >>>> Your rounding means if at least half of the subpages is mapped, >>>> the average will be 1. Others might think 1/3 is mapped, >>>> the average will be 1. That is why I think adjust looks like >>>> a magic number. >>> >>> I think all callers could tolerate (or benefit) from folio_average_page_mapcount() returning at least 1 in case any page is mapped. >>> >>> There was a reason why I decided to round to the nearest integer instead. >>> >>> Let me think about this once more, I went back and forth a couple of times on this. >> >> Sure. Your current choice might be good enough for now. My intend of >> adding a comment here is just to let people know the adjust can be >> changed in the future. :) > > The following will make the callers easier to read, while keeping > the rounding to the next integer for the other cases untouched. > > +/** > + * folio_average_page_mapcount() - Average number of mappings per page in this > + * folio > + * @folio: The folio. > + * > + * The average number of present user page table entries that reference each > + * page in this folio as tracked via the RMAP: either referenced directly > + * (PTE) or as part of a larger area that covers this page (e.g., PMD). > + * > + * The average is calculated by rounding to the nearest integer; however, > + * if at least a single page is mapped, the average will be at least 1. > + * > + * Returns: The average number of mappings per page in this folio. > + */ > +static inline int folio_average_page_mapcount(struct folio *folio) > +{ > + int mapcount, entire_mapcount, avg; > + > + if (!folio_test_large(folio)) > + return atomic_read(&folio->_mapcount) + 1; > + > + mapcount = folio_large_mapcount(folio); > + if (unlikely(mapcount <= 0)) > + return 0; > + entire_mapcount = folio_entire_mapcount(folio); > + if (mapcount <= entire_mapcount) > + return entire_mapcount; > + mapcount -= entire_mapcount; > + > + /* Round to closest integer ... */ > + avg = (mapcount + folio_large_nr_pages(folio) / 2) >> folio_large_order(folio); > + avg += entire_mapcount; > + /* ... but return at least 1. */ > + return max_t(int, avg, 1); > +} LGTM. Thanks. Best Regards, Yan, Zi