On Thu, Jan 26, 2023 at 1:15 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 25.01.23 17:22, James Houghton wrote: > > On Wed, Jan 25, 2023 at 7:54 AM Peter Xu <peterx@xxxxxxxxxx> wrote: > >> > >> On Wed, Jan 25, 2023 at 07:26:49AM -0800, James Houghton wrote: > >>>> At first thought this seems bad. However, I believe this has been the > >>>> behavior since hugetlb PMD sharing was introduced in 2006 and I am > >>>> unaware of any reported issues. I did a audit of code looking at > >>>> mapcount. In addition to the above issue with smaps, there appears > >>>> to be an issue with 'migrate_pages' where shared pages could be migrated > >>>> without appropriate privilege. > >>>> > >>>> /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ > >>>> if (flags & (MPOL_MF_MOVE_ALL) || > >>>> (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { > >>>> if (isolate_hugetlb(page, qp->pagelist) && > >>>> (flags & MPOL_MF_STRICT)) > >>>> /* > >>>> * Failed to isolate page but allow migrating pages > >>>> * which have been queued. > >>>> */ > >>>> ret = 1; > >>>> } > >>> > >>> This isn't the exact same problem you're fixing Mike, but I want to > >>> point out a related problem. > >>> > >>> This is the generic-mm-equivalent of the hugetlb code above: > >>> > >>> static int migrate_page_add(struct page *page, struct list_head > >>> *pagelist, unsigned long flags) > >>> { > >>> struct page *head = compound_head(page); > >>> /* > >>> * Avoid migrating a page that is shared with others. > >>> */ > >>> if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { > >>> if (!isolate_lru_page(head)) { > >>> list_add_tail(&head->lru, pagelist); > >>> mod_node_page_state(page_pgdat(head), > >>> NR_ISOLATED_ANON + page_is_file_lru(head), > >>> thp_nr_pages(head)); > >>> ... > >>> } > >>> > >>> If you have a partially PTE-mapped THP, page_mapcount(head) will not > >>> accurately determine if a page is mapped in multiple VMAs or not (it > >>> only tells you how many times the head page is mapped). > >>> > >>> For example... > >>> 1) You could have the THP PMD-mapped in one VMA, and then one tail > >>> page of the THP can be mapped in another. page_mapcount(head) will be > >>> 1. > >>> 2) You could have two VMAs map two separate tail pages of the THP, in > >>> which case page_mapcount(head) will be 0. > >>> > >>> I bring this up because we have the same problem with HugeTLB > >>> high-granularity mapping. > >> > >> Maybe a better match here is total_mapcount() rather than page_mapcount() > >> (despite the overheads on the sub-page loop)? > > IIRC, total_mapcount() would also not be what we want: for a PTE-mapped > THP it would be e.g. 512 instead of one. [unless I am confused again > about mapcounts] > > See my other comment, I believe this is supposed to be a guesstimate > whether "this page is shared". And we use the first subpage to make a > guess here ... I tried to dig into the git and review history. It seems like the code was added when THP migration support was introduced, and was just copied from the base page case IIRC. I agree it is a heuristic guess about whether this page is shared or not, but reading the head page's mapcount is not correct AFAICT. The total_mapcount should be used, although it can't distinguish unshared PTE mapped (multiple subpages mapped by PTEs) THP, but it could filter out shared pages as expected. > > Of course, we could try harder, by looking at > 1 subpage, to test if > any of these subpages has a mapcount > 1. But it still wouldn't be > accurate .... A little bit overkilling TBH. > > -- > Thanks, > > David / dhildenb > >