On 25.01.23 17:22, James Houghton wrote:
On Wed, Jan 25, 2023 at 7:54 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
On Wed, Jan 25, 2023 at 07:26:49AM -0800, James Houghton wrote:
At first thought this seems bad. However, I believe this has been the
behavior since hugetlb PMD sharing was introduced in 2006 and I am
unaware of any reported issues. I did a audit of code looking at
mapcount. In addition to the above issue with smaps, there appears
to be an issue with 'migrate_pages' where shared pages could be migrated
without appropriate privilege.
/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
if (flags & (MPOL_MF_MOVE_ALL) ||
(flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
if (isolate_hugetlb(page, qp->pagelist) &&
(flags & MPOL_MF_STRICT))
/*
* Failed to isolate page but allow migrating pages
* which have been queued.
*/
ret = 1;
}
This isn't the exact same problem you're fixing Mike, but I want to
point out a related problem.
This is the generic-mm-equivalent of the hugetlb code above:
static int migrate_page_add(struct page *page, struct list_head
*pagelist, unsigned long flags)
{
struct page *head = compound_head(page);
/*
* Avoid migrating a page that is shared with others.
*/
if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) {
if (!isolate_lru_page(head)) {
list_add_tail(&head->lru, pagelist);
mod_node_page_state(page_pgdat(head),
NR_ISOLATED_ANON + page_is_file_lru(head),
thp_nr_pages(head));
...
}
If you have a partially PTE-mapped THP, page_mapcount(head) will not
accurately determine if a page is mapped in multiple VMAs or not (it
only tells you how many times the head page is mapped).
For example...
1) You could have the THP PMD-mapped in one VMA, and then one tail
page of the THP can be mapped in another. page_mapcount(head) will be
1.
2) You could have two VMAs map two separate tail pages of the THP, in
which case page_mapcount(head) will be 0.
I bring this up because we have the same problem with HugeTLB
high-granularity mapping.
Maybe a better match here is total_mapcount() rather than page_mapcount()
(despite the overheads on the sub-page loop)?
IIRC, total_mapcount() would also not be what we want: for a PTE-mapped
THP it would be e.g. 512 instead of one. [unless I am confused again
about mapcounts]
See my other comment, I believe this is supposed to be a guesstimate
whether "this page is shared". And we use the first subpage to make a
guess here ...
Of course, we could try harder, by looking at > 1 subpage, to test if
any of these subpages has a mapcount > 1. But it still wouldn't be
accurate ....
--
Thanks,
David / dhildenb