On 14 Feb 2024, at 5:38, Ryan Roberts wrote: > On 13/02/2024 21:55, Zi Yan wrote: >> From: Zi Yan <ziy@xxxxxxxxxx> >> >> To split a THP to any lower order (except order-1) pages, we need to >> reform THPs on subpages at given order and add page refcount based on the >> new page order. Also we need to reinitialize page_deferred_list after >> removing the page from the split_queue, otherwise a subsequent split will >> see list corruption when checking the page_deferred_list again. >> >> It has many uses, like minimizing the number of pages after >> truncating a huge pagecache page. For anonymous THPs, we can only split >> them to order-0 like before until we add support for any size anonymous >> THPs. > > multi-size THP is now upstream. Not sure if this comment still makes sense. Will change it to reflect the fact that multi-size THP is already upstream. > Still its not completely clear to me how you would integrate this new machinery > and decide what non-zero order to split anon THP to? Originally, it was developed along with my 1GB THP support. So it was intended to split order-18 to order-9. But for now, like you and David said in the cover letter email thread, we might not want to use it for anonymous large folios until we find a necessary use case. >> >> Order-1 folio is not supported because _deferred_list, which is used by >> partially mapped folios, is stored in subpage 2 and an order-1 folio only >> has subpage 0 and 1. >> >> Signed-off-by: Zi Yan <ziy@xxxxxxxxxx> >> --- >> include/linux/huge_mm.h | 21 +++++--- >> mm/huge_memory.c | 114 +++++++++++++++++++++++++++++++--------- >> 2 files changed, 101 insertions(+), 34 deletions(-) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 5adb86af35fc..de0c89105076 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -265,10 +265,11 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, >> >> void folio_prep_large_rmappable(struct folio *folio); >> bool can_split_folio(struct folio *folio, int *pextra_pins); >> -int split_huge_page_to_list(struct page *page, struct list_head *list); >> +int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >> + unsigned int new_order); >> static inline int split_huge_page(struct page *page) >> { >> - return split_huge_page_to_list(page, NULL); >> + return split_huge_page_to_list_to_order(page, NULL, 0); >> } >> void deferred_split_folio(struct folio *folio); >> >> @@ -422,7 +423,8 @@ can_split_folio(struct folio *folio, int *pextra_pins) >> return false; >> } >> static inline int >> -split_huge_page_to_list(struct page *page, struct list_head *list) >> +split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >> + unsigned int new_order) >> { >> return 0; >> } >> @@ -519,17 +521,20 @@ static inline bool thp_migration_supported(void) >> } >> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ >> >> -static inline int split_folio_to_list(struct folio *folio, >> - struct list_head *list) >> +static inline int split_folio_to_list_to_order(struct folio *folio, >> + struct list_head *list, int new_order) >> { >> - return split_huge_page_to_list(&folio->page, list); >> + return split_huge_page_to_list_to_order(&folio->page, list, new_order); >> } >> >> -static inline int split_folio(struct folio *folio) >> +static inline int split_folio_to_order(struct folio *folio, int new_order) >> { >> - return split_folio_to_list(folio, NULL); >> + return split_folio_to_list_to_order(folio, NULL, new_order); >> } >> >> +#define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) >> +#define split_folio(f) split_folio_to_order(f, 0) >> + >> /* >> * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to >> * limitations in the implementation like arm64 MTE can override this to >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index ad7133c97428..d0e555a8ea98 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2718,11 +2718,14 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, >> >> static void unmap_folio(struct folio *folio) >> { >> - enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | >> - TTU_SYNC | TTU_BATCH_FLUSH; >> + enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SYNC | >> + TTU_BATCH_FLUSH; >> >> VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); >> >> + if (folio_test_pmd_mappable(folio)) >> + ttu_flags |= TTU_SPLIT_HUGE_PMD; > > Should we split this change out? I think it makes sense independent of this series? > Sure. Since multi-size THP is upstream, this avoid unnecessary code path if the THP is not PMD-mapped. >> + >> /* >> * Anon pages need migration entries to preserve them, but file >> * pages can simply be left unmapped, then faulted back on demand. >> @@ -2756,7 +2759,6 @@ static void lru_add_page_tail(struct page *head, struct page *tail, >> struct lruvec *lruvec, struct list_head *list) >> { >> VM_BUG_ON_PAGE(!PageHead(head), head); >> - VM_BUG_ON_PAGE(PageCompound(tail), head); >> VM_BUG_ON_PAGE(PageLRU(tail), head); >> lockdep_assert_held(&lruvec->lru_lock); >> >> @@ -2777,7 +2779,8 @@ static void lru_add_page_tail(struct page *head, struct page *tail, >> } >> >> static void __split_huge_page_tail(struct folio *folio, int tail, >> - struct lruvec *lruvec, struct list_head *list) >> + struct lruvec *lruvec, struct list_head *list, >> + unsigned int new_order) >> { >> struct page *head = &folio->page; >> struct page *page_tail = head + tail; >> @@ -2847,10 +2850,15 @@ static void __split_huge_page_tail(struct folio *folio, int tail, >> * which needs correct compound_head(). >> */ >> clear_compound_head(page_tail); >> + if (new_order) { >> + prep_compound_page(page_tail, new_order); >> + folio_prep_large_rmappable(page_folio(page_tail)); >> + } >> >> /* Finally unfreeze refcount. Additional reference from page cache. */ >> - page_ref_unfreeze(page_tail, 1 + (!folio_test_anon(folio) || >> - folio_test_swapcache(folio))); >> + page_ref_unfreeze(page_tail, >> + 1 + ((!folio_test_anon(folio) || folio_test_swapcache(folio)) ? >> + folio_nr_pages(page_folio(page_tail)) : 0)); >> >> if (folio_test_young(folio)) >> folio_set_young(new_folio); >> @@ -2868,7 +2876,7 @@ static void __split_huge_page_tail(struct folio *folio, int tail, >> } >> >> static void __split_huge_page(struct page *page, struct list_head *list, >> - pgoff_t end) >> + pgoff_t end, unsigned int new_order) >> { >> struct folio *folio = page_folio(page); >> struct page *head = &folio->page; >> @@ -2877,10 +2885,11 @@ static void __split_huge_page(struct page *page, struct list_head *list, >> unsigned long offset = 0; >> unsigned int nr = thp_nr_pages(head); >> int i, nr_dropped = 0; >> + unsigned int new_nr = 1 << new_order; >> int order = folio_order(folio); >> >> /* complete memcg works before add pages to LRU */ >> - split_page_memcg(head, order, 0); >> + split_page_memcg(head, order, new_order); >> >> if (folio_test_anon(folio) && folio_test_swapcache(folio)) { >> offset = swp_offset(folio->swap); >> @@ -2893,8 +2902,8 @@ static void __split_huge_page(struct page *page, struct list_head *list, >> >> ClearPageHasHWPoisoned(head); >> >> - for (i = nr - 1; i >= 1; i--) { >> - __split_huge_page_tail(folio, i, lruvec, list); >> + for (i = nr - new_nr; i >= new_nr; i -= new_nr) { >> + __split_huge_page_tail(folio, i, lruvec, list, new_order); >> /* Some pages can be beyond EOF: drop them from page cache */ >> if (head[i].index >= end) { >> struct folio *tail = page_folio(head + i); >> @@ -2910,29 +2919,41 @@ static void __split_huge_page(struct page *page, struct list_head *list, >> __xa_store(&head->mapping->i_pages, head[i].index, >> head + i, 0); >> } else if (swap_cache) { >> + /* >> + * split anonymous THPs (including swapped out ones) to >> + * non-zero order not supported >> + */ >> + VM_WARN_ONCE(new_order, >> + "Split swap-cached anon folio to non-0 order not supported"); > > Why isn't it supported? Even if it's not supported, is this level the right > place to enforce these kinds of policy decisions? I wonder if we should be > leaving that to the higher level to decide? Is the swap-out small-size THP without splitting merged? This needs that patchset. You are right that a warning here is not appropriate. I will fail the splitting if the folio is swapcached and going to be split into >0 order. >> __xa_store(&swap_cache->i_pages, offset + i, >> head + i, 0); >> } >> } >> -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature