On Wed, Jan 25, 2023 at 08:22:24AM -0800, James Houghton wrote: > On Wed, Jan 25, 2023 at 7:54 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > On Wed, Jan 25, 2023 at 07:26:49AM -0800, James Houghton wrote: > > > > At first thought this seems bad. However, I believe this has been the > > > > behavior since hugetlb PMD sharing was introduced in 2006 and I am > > > > unaware of any reported issues. I did a audit of code looking at > > > > mapcount. In addition to the above issue with smaps, there appears > > > > to be an issue with 'migrate_pages' where shared pages could be migrated > > > > without appropriate privilege. > > > > > > > > /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > > > > if (flags & (MPOL_MF_MOVE_ALL) || > > > > (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > > > > if (isolate_hugetlb(page, qp->pagelist) && > > > > (flags & MPOL_MF_STRICT)) > > > > /* > > > > * Failed to isolate page but allow migrating pages > > > > * which have been queued. > > > > */ > > > > ret = 1; > > > > } > > > > > > This isn't the exact same problem you're fixing Mike, but I want to > > > point out a related problem. > > > > > > This is the generic-mm-equivalent of the hugetlb code above: > > > > > > static int migrate_page_add(struct page *page, struct list_head > > > *pagelist, unsigned long flags) > > > { > > > struct page *head = compound_head(page); > > > /* > > > * Avoid migrating a page that is shared with others. > > > */ > > > if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { > > > if (!isolate_lru_page(head)) { > > > list_add_tail(&head->lru, pagelist); > > > mod_node_page_state(page_pgdat(head), > > > NR_ISOLATED_ANON + page_is_file_lru(head), > > > thp_nr_pages(head)); > > > ... > > > } > > > There's also 3 functions in migrate that appear to check for a similar condition - add_page_for_migration(), numamigrate_isolate_page(), and migrate_misplaced_page(). > > > If you have a partially PTE-mapped THP, page_mapcount(head) will not > > > accurately determine if a page is mapped in multiple VMAs or not (it > > > only tells you how many times the head page is mapped). > > > > > > For example... > > > 1) You could have the THP PMD-mapped in one VMA, and then one tail > > > page of the THP can be mapped in another. page_mapcount(head) will be > > > 1. > > > 2) You could have two VMAs map two separate tail pages of the THP, in > > > which case page_mapcount(head) will be 0. > > > > > > I bring this up because we have the same problem with HugeTLB > > > high-granularity mapping. > > > > Maybe a better match here is total_mapcount() rather than page_mapcount() > > (despite the overheads on the sub-page loop)? > > This would kind of fix the problem, but it would be too conservative now. :) I agree. Interestingly, numamigrate_isolate_page() does take the total_mapcount() approach right now, so I'm not sure how much of a difference being more conservative makes. > In both example 1 and 2 above, total_mapcount(head) for both would be > 2, so that's ok. But now consider: you have one VMA that is > PTE-mapping two pieces of the same THP. total_mapcount(head) is still > 2, even though only a single VMA is mapping the page.