On Thu, Jan 19, 2023 at 09:24:16AM +0800, Yin, Fengwei wrote: > On 1/19/2023 7:22 AM, Vishal Moola (Oracle) wrote: > > @@ -1022,27 +1022,23 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, > > } > > > > #ifdef CONFIG_MIGRATION > > -/* > > - * page migration, thp tail pages can be passed. > > - */ > > -static int migrate_page_add(struct page *page, struct list_head *pagelist, > > +static int migrate_folio_add(struct folio *folio, struct list_head *foliolist, > > unsigned long flags) > > { > > - struct page *head = compound_head(page); > > /* > > - * Avoid migrating a page that is shared with others. > > + * Avoid migrating a folio that is shared with others. > > */ > > - if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { > > - if (!isolate_lru_page(head)) { > > - list_add_tail(&head->lru, pagelist); > > - mod_node_page_state(page_pgdat(head), > > - NR_ISOLATED_ANON + page_is_file_lru(head), > > - thp_nr_pages(head)); > > + if ((flags & MPOL_MF_MOVE_ALL) || folio_mapcount(folio) == 1) { > One question to the page_mapcount -> folio_mapcount here. > > For a large folio with 0 entire mapcount, if the first sub-page and any > other sub-page are mapped, page_mapcount(head) == 1 is true while > folio_mapcount(folio) == 1 is not. We had a good discussion about this in today's THP Cabal meeting [1]. I didn't quite check everything that I said was true, so let me summarise & correct it now ... - This is a heuristic. We're trying to see whether this folio is mapped by multiple processes (because if it is, it's probably not worth migrating). If the heuristic is wrong, it probably doesn't matter _too_ much? - A proper heuristic for this would be folio_total_mapcount(folio) == folio_nr_pages(folio) but this would be expensive to calculate as it requires examining 512 cachelines for a 2MB page. - For a large folio which is smaller than PMD size, we're guaranteed that folio_mapcount() is 0 today. - In the meeting I said that page_mapcount() of the head of a THP page was zero; that's not true; I had forgotten that we added in entire_mapcount to the individual page mapcount. so I now think this should be: page_mapcount(folio_page(folio, 0)) with an explanation that checking every page is too heavy-weight. Maybe it should be its own function: static inline int folio_estimated_mapcount(folio) { return page_mapcount(folio_page(folio, 0)); } with a nice comment explaining what's going on. [1] https://www.youtube.com/watch?v=A3PoGQQQD3Q is the recording of today's meeting.