On 24 Feb 2025, at 16:15, David Hildenbrand wrote: > On 24.02.25 22:10, Zi Yan wrote: >> On 24 Feb 2025, at 16:02, David Hildenbrand wrote: >> >>> On 24.02.25 21:40, Zi Yan wrote: >>>> On Mon Feb 24, 2025 at 11:55 AM EST, David Hildenbrand wrote: >>>>> Let's implement an alternative when per-page mapcounts in large folios >>>>> are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT. >>>>> >>>>> For large folios, we'll return the per-page average mapcount within the >>>>> folio, except when the average is 0 but the folio is mapped: then we >>>>> return 1. >>>>> >>>>> For hugetlb folios and for large folios that are fully mapped >>>>> into all address spaces, there is no change. >>>>> >>>>> As an alternative, we could simply return 0 for non-hugetlb large folios, >>>>> or disable this legacy interface with CONFIG_NO_PAGE_MAPCOUNT. >>>>> >>>>> But the information exposed by this interface can still be valuable, and >>>>> frequently we deal with fully-mapped large folios where the average >>>>> corresponds to the actual page mapcount. So we'll leave it like this for >>>>> now and document the new behavior. >>>>> >>>>> Note: this interface is likely not very relevant for performance. If >>>>> ever required, we could try doing a rather expensive rmap walk to collect >>>>> precisely how often this folio page is mapped. >>>>> >>>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >>>>> --- >>>>> Documentation/admin-guide/mm/pagemap.rst | 7 +++++- >>>>> fs/proc/internal.h | 31 ++++++++++++++++++++++++ >>>>> fs/proc/page.c | 19 ++++++++++++--- >>>>> 3 files changed, 53 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst >>>>> index caba0f52dd36c..49590306c61a0 100644 >>>>> --- a/Documentation/admin-guide/mm/pagemap.rst >>>>> +++ b/Documentation/admin-guide/mm/pagemap.rst >>>>> @@ -42,7 +42,12 @@ There are four components to pagemap: >>>>> skip over unmapped regions. >>>>> * ``/proc/kpagecount``. This file contains a 64-bit count of the number of >>>>> - times each page is mapped, indexed by PFN. >>>>> + times each page is mapped, indexed by PFN. Some kernel configurations do >>>>> + not track the precise number of times a page part of a larger allocation >>>>> + (e.g., THP) is mapped. In these configurations, the average number of >>>>> + mappings per page in this larger allocation is returned instead. However, >>>>> + if any page of the large allocation is mapped, the returned value will >>>>> + be at least 1. >>>>> The page-types tool in the tools/mm directory can be used to query the >>>>> number of times a page is mapped. >>>>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h >>>>> index 1695509370b88..16aa1fd260771 100644 >>>>> --- a/fs/proc/internal.h >>>>> +++ b/fs/proc/internal.h >>>>> @@ -174,6 +174,37 @@ static inline int folio_precise_page_mapcount(struct folio *folio, >>>>> return mapcount; >>>>> } >>>>> +/** >>>>> + * folio_average_page_mapcount() - Average number of mappings per page in this >>>>> + * folio >>>>> + * @folio: The folio. >>>>> + * >>>>> + * The average number of present user page table entries that reference each >>>>> + * page in this folio as tracked via the RMAP: either referenced directly >>>>> + * (PTE) or as part of a larger area that covers this page (e.g., PMD). >>>>> + * >>>>> + * Returns: The average number of mappings per page in this folio. 0 for >>>>> + * folios that are not mapped to user space or are not tracked via the RMAP >>>>> + * (e.g., shared zeropage). >>>>> + */ >>>>> +static inline int folio_average_page_mapcount(struct folio *folio) >>>>> +{ >>>>> + int mapcount, entire_mapcount; >>>>> + unsigned int adjust; >>>>> + >>>>> + if (!folio_test_large(folio)) >>>>> + return atomic_read(&folio->_mapcount) + 1; >>>>> + >>>>> + mapcount = folio_large_mapcount(folio); >>>>> + entire_mapcount = folio_entire_mapcount(folio); >>>>> + if (mapcount <= entire_mapcount) >>>>> + return entire_mapcount; >>>>> + mapcount -= entire_mapcount; >>>>> + >>>>> + adjust = folio_large_nr_pages(folio) / 2; >>> >>> Thanks for the review! >>> >>>> >>>> Is there any reason for choosing this adjust number? A comment might be >>>> helpful in case people want to change it later, either with some reasoning >>>> or just saying it is chosen empirically. >>> >>> We're dividing by folio_large_nr_pages(folio) (shifting by folio_large_order(folio)), so this is not a magic number at all. >>> >>> So this should be "ordinary" rounding. >> >> I thought the rounding would be (mapcount + 511) / 512. > > Yes, that's "rounding up". > >> But >> that means if one subpage is mapped, the average will be 1. >> Your rounding means if at least half of the subpages is mapped, >> the average will be 1. Others might think 1/3 is mapped, >> the average will be 1. That is why I think adjust looks like >> a magic number. > > I think all callers could tolerate (or benefit) from folio_average_page_mapcount() returning at least 1 in case any page is mapped. > > There was a reason why I decided to round to the nearest integer instead. > > Let me think about this once more, I went back and forth a couple of times on this. Sure. Your current choice might be good enough for now. My intend of adding a comment here is just to let people know the adjust can be changed in the future. :) Best Regards, Yan, Zi