Re: [PATCH v2 16/20] fs/proc/page: remove per-page mapcount dependency for /proc/kpagecount (CONFIG_NO_PAGE_MAPCOUNT)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24 Feb 2025, at 16:02, David Hildenbrand wrote:

> On 24.02.25 21:40, Zi Yan wrote:
>> On Mon Feb 24, 2025 at 11:55 AM EST, David Hildenbrand wrote:
>>> Let's implement an alternative when per-page mapcounts in large folios
>>> are no longer maintained -- soon with CONFIG_NO_PAGE_MAPCOUNT.
>>>
>>> For large folios, we'll return the per-page average mapcount within the
>>> folio, except when the average is 0 but the folio is mapped: then we
>>> return 1.
>>>
>>> For hugetlb folios and for large folios that are fully mapped
>>> into all address spaces, there is no change.
>>>
>>> As an alternative, we could simply return 0 for non-hugetlb large folios,
>>> or disable this legacy interface with CONFIG_NO_PAGE_MAPCOUNT.
>>>
>>> But the information exposed by this interface can still be valuable, and
>>> frequently we deal with fully-mapped large folios where the average
>>> corresponds to the actual page mapcount. So we'll leave it like this for
>>> now and document the new behavior.
>>>
>>> Note: this interface is likely not very relevant for performance. If
>>> ever required, we could try doing a rather expensive rmap walk to collect
>>> precisely how often this folio page is mapped.
>>>
>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>>> ---
>>>   Documentation/admin-guide/mm/pagemap.rst |  7 +++++-
>>>   fs/proc/internal.h                       | 31 ++++++++++++++++++++++++
>>>   fs/proc/page.c                           | 19 ++++++++++++---
>>>   3 files changed, 53 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
>>> index caba0f52dd36c..49590306c61a0 100644
>>> --- a/Documentation/admin-guide/mm/pagemap.rst
>>> +++ b/Documentation/admin-guide/mm/pagemap.rst
>>> @@ -42,7 +42,12 @@ There are four components to pagemap:
>>>      skip over unmapped regions.
>>>     * ``/proc/kpagecount``.  This file contains a 64-bit count of the number of
>>> -   times each page is mapped, indexed by PFN.
>>> +   times each page is mapped, indexed by PFN. Some kernel configurations do
>>> +   not track the precise number of times a page part of a larger allocation
>>> +   (e.g., THP) is mapped. In these configurations, the average number of
>>> +   mappings per page in this larger allocation is returned instead. However,
>>> +   if any page of the large allocation is mapped, the returned value will
>>> +   be at least 1.
>>>    The page-types tool in the tools/mm directory can be used to query the
>>>   number of times a page is mapped.
>>> diff --git a/fs/proc/internal.h b/fs/proc/internal.h
>>> index 1695509370b88..16aa1fd260771 100644
>>> --- a/fs/proc/internal.h
>>> +++ b/fs/proc/internal.h
>>> @@ -174,6 +174,37 @@ static inline int folio_precise_page_mapcount(struct folio *folio,
>>>   	return mapcount;
>>>   }
>>>  +/**
>>> + * folio_average_page_mapcount() - Average number of mappings per page in this
>>> + *				   folio
>>> + * @folio: The folio.
>>> + *
>>> + * The average number of present user page table entries that reference each
>>> + * page in this folio as tracked via the RMAP: either referenced directly
>>> + * (PTE) or as part of a larger area that covers this page (e.g., PMD).
>>> + *
>>> + * Returns: The average number of mappings per page in this folio. 0 for
>>> + * folios that are not mapped to user space or are not tracked via the RMAP
>>> + * (e.g., shared zeropage).
>>> + */
>>> +static inline int folio_average_page_mapcount(struct folio *folio)
>>> +{
>>> +	int mapcount, entire_mapcount;
>>> +	unsigned int adjust;
>>> +
>>> +	if (!folio_test_large(folio))
>>> +		return atomic_read(&folio->_mapcount) + 1;
>>> +
>>> +	mapcount = folio_large_mapcount(folio);
>>> +	entire_mapcount = folio_entire_mapcount(folio);
>>> +	if (mapcount <= entire_mapcount)
>>> +		return entire_mapcount;
>>> +	mapcount -= entire_mapcount;
>>> +
>>> +	adjust = folio_large_nr_pages(folio) / 2;
>
> Thanks for the review!
>
>>
>> Is there any reason for choosing this adjust number? A comment might be
>> helpful in case people want to change it later, either with some reasoning
>> or just saying it is chosen empirically.
>
> We're dividing by folio_large_nr_pages(folio) (shifting by folio_large_order(folio)), so this is not a magic number at all.
>
> So this should be "ordinary" rounding.

I thought the rounding would be (mapcount + 511) / 512. But
that means if one subpage is mapped, the average will be 1.
Your rounding means if at least half of the subpages is mapped,
the average will be 1. Others might think 1/3 is mapped,
the average will be 1. That is why I think adjust looks like
a magic number.

>
> Assume nr_pages = 512.
>
> With 255 we want to round down, with 256 we want to round up.
>
> 255 / 512 = 0 :)
> 256 / 512 = 0 :(
>
> Compared to:
>
> (255 + (512 / 2)) / 512 = (255 + 256) / 512 = 0 :)
> (256 + (512 / 2)) / 512 = (256 + 256) / 512 = 1 :)
>
>>
>>> +	return ((mapcount + adjust) >> folio_large_order(folio)) +
>>> +		entire_mapcount;
>>> +}
>>>   /*
>>>    * array.c
>>>    */
>>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>>> index a55f5acefa974..4d3290cc69667 100644
>>> --- a/fs/proc/page.c
>>> +++ b/fs/proc/page.c
>>> @@ -67,9 +67,22 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
>>>   		 * memmaps that were actually initialized.
>>>   		 */
>>>   		page = pfn_to_online_page(pfn);
>>> -		if (page)
>>> -			mapcount = folio_precise_page_mapcount(page_folio(page),
>>> -							       page);
>>> +		if (page) {
>>> +			struct folio *folio = page_folio(page);
>>> +
>>> +			if (IS_ENABLED(CONFIG_PAGE_MAPCOUNT)) {
>>> +				mapcount = folio_precise_page_mapcount(folio, page);
>>> +			} else {
>>> +				/*
>>> +				 * Indicate the per-page average, but at least "1" for
>>> +				 * mapped folios.
>>> +				 */
>>> +				mapcount = folio_average_page_mapcount(folio);
>>> +				if (!mapcount && folio_test_large(folio) &&
>>> +				    folio_mapped(folio))
>>> +					mapcount = 1;
>>
>> This should be part of folio_average_page_mapcount() right?
>
> No, that's not desired.
>
>> Otherwise, the comment on folio_average_page_mapcount() is not correct,
>> since it can return 0 when a folio is mapped to user space.
>
> It's misleading. I'll clarify the comment, probably simply saying:
>
> Returns: The average number of mappings per page in this folio.

Got it.

Best Regards,
Yan, Zi





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux