Re: A mapcount riddle

Michal Hocko <mhocko@xxxxxxxx> · Wed, 25 Jan 2023 09:24:27 +0100

On Tue 24-01-23 12:56:24, Mike Kravetz wrote:
> Q How can a page be mapped into multiple processes and have a
>   mapcount of 1?
> 
> A It is a hugetlb page referenced by a shared PMD.
> 
> I was looking to expose some basic information about PMD sharing via
> /proc/smaps.  After adding the code, I started a couple processes
> sharing a large hugetlb mapping that would result in the use of
> shared PMDs.  When I looked at the output of /proc/smaps, I saw
> my new metric counting the number of shared PMDs.  However, what
> stood out was that the entire mapping was listed as Private_Hugetlb.
> WTH???  It certainly was shared!

It's been quite some time since I had to look into this area but pmd
shared hugetlb pages have always been quite weird AFAIR.

> The routine smaps_hugetlb_range
> decides between Private_Hugetlb and Shared_Hugetlb with this code:
> 
> 	if (page) {
> 		int mapcount = page_mapcount(page);
> 
> 		if (mapcount >= 2)
> 			mss->shared_hugetlb += huge_page_size(hstate_vma(vma));
> 		else
> 			mss->private_hugetlb += huge_page_size(hstate_vma(vma));
> 	}
> 
> After spending some time looking for issues in the page_mapcount code,
> I came to the realization that the mapcount of hugetlb pages only
> referenced by a shared PMD would be 1 no matter how many processes had
> mapped the page.  When a page is first faulted, the mapcount is set to 1.
> When faulted in other processes, the shared PMD is added to the page
> table of the other processes.  No increase of mapcount will occur.

yes, really subtle but looking at it from the hugetlb POV, it is page
table that is shared rather than the underlying page. Is this
distinction useful/reasonable to the userspace. Not really but pmd
sharing is quite hard to stumble over by accident and I suspect most
users who use this feature just got used to those specialities.

> At first thought this seems bad.  However, I believe this has been the
> behavior since hugetlb PMD sharing was introduced in 2006 and I am
> unaware of any reported issues.  I did a audit of code looking at
> mapcount.  In addition to the above issue with smaps, there appears
> to be an issue with 'migrate_pages' where shared pages could be migrated
> without appropriate privilege.
> 
> 	/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
> 	if (flags & (MPOL_MF_MOVE_ALL) ||
> 	    (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
> 		if (isolate_hugetlb(page, qp->pagelist) &&
> 			(flags & MPOL_MF_STRICT))
> 			/*
> 			 * Failed to isolate page but allow migrating pages
> 			 * which have been queued.
> 			 */
> 			ret = 1;
> 	}

Could you elaborate what is problematic about that? The whole pmd
sharing is a cooperative thing. So if some of the processes decides to
migrate the page then why that should be a problem for others sharing
that page via page table? Am I missing something obvious?

> I will prepare fixes for both of these.  However, I wanted to ask if
> anyone has ideas about other potential issues with this?
> 
> Since COW is mostly relevant to private mappings, shared PMDs generally
> do not apply.  Nothing stood out in a quick audit of code.

I am pretty sure there are other corner cases lurking in this area which
are really hard to look through until you stumble over them. The shared
mapping reporting is probably good to have fixed but I am not sure why
the migration is a real problem.

Thanks!
-- 
Michal Hocko
SUSE Labs