On 1/21/2023 3:47 AM, Matthew Wilcox wrote: > On Thu, Jan 19, 2023 at 09:24:16AM +0800, Yin, Fengwei wrote: >> On 1/19/2023 7:22 AM, Vishal Moola (Oracle) wrote: >>> @@ -1022,27 +1022,23 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, >>> } >>> >>> #ifdef CONFIG_MIGRATION >>> -/* >>> - * page migration, thp tail pages can be passed. >>> - */ >>> -static int migrate_page_add(struct page *page, struct list_head *pagelist, >>> +static int migrate_folio_add(struct folio *folio, struct list_head *foliolist, >>> unsigned long flags) >>> { >>> - struct page *head = compound_head(page); >>> /* >>> - * Avoid migrating a page that is shared with others. >>> + * Avoid migrating a folio that is shared with others. >>> */ >>> - if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { >>> - if (!isolate_lru_page(head)) { >>> - list_add_tail(&head->lru, pagelist); >>> - mod_node_page_state(page_pgdat(head), >>> - NR_ISOLATED_ANON + page_is_file_lru(head), >>> - thp_nr_pages(head)); >>> + if ((flags & MPOL_MF_MOVE_ALL) || folio_mapcount(folio) == 1) { >> One question to the page_mapcount -> folio_mapcount here. >> >> For a large folio with 0 entire mapcount, if the first sub-page and any >> other sub-page are mapped, page_mapcount(head) == 1 is true while >> folio_mapcount(folio) == 1 is not. > > We had a good discussion about this in today's THP Cabal meeting [1]. I > didn't quite check everything that I said was true, so let me summarise > & correct it now ... > > - This is a heuristic. We're trying to see whether this folio is > mapped by multiple processes (because if it is, it's probably not > worth migrating). If the heuristic is wrong, it probably doesn't > matter _too_ much? Agree. > - A proper heuristic for this would be > folio_total_mapcount(folio) == folio_nr_pages(folio) I am not sure. File folio can be partially mapped. Maybe following check? for each sub-pages: (folio_entire_mapcount(folio) + sub-pages->_mapcount) <= 1 But it's also expensive to check all sub-pages. Maybe a bit in folio if filio mapped to only one process is really important? > but this would be expensive to calculate as it requires examining > 512 cachelines for a 2MB page. > - For a large folio which is smaller than PMD size, we're guaranteed > that folio_mapcount() is 0 today. My understanding is: for large folio, if any sub-page is mapped, folio_mapcount() can not be 0. > - In the meeting I said that page_mapcount() of the head of a THP > page was zero; that's not true; I had forgotten that we added in > entire_mapcount to the individual page mapcount. > > so I now think this should be: > > page_mapcount(folio_page(folio, 0)) For file large folio, it's possible folio_page(folio, 0) mapped only once, other sub-pages mapped multiple times. But I think this maybe the best choice here. > > with an explanation that checking every page is too heavy-weight. > Maybe it should be its own function: > > static inline int folio_estimated_mapcount(folio) > { > return page_mapcount(folio_page(folio, 0)); > } > > with a nice comment explaining what's going on. > > [1] https://www.youtube.com/watch?v=A3PoGQQQD3Q is the recording of > today's meeting. This is nice. Thanks a lot for sharing. Regards Yin, Fengwei