On 2024/9/29 10:04, Kefeng Wang wrote: > > > On 2024/9/29 9:16, Miaohe Lin wrote: >> On 2024/9/28 16:39, David Hildenbrand wrote: >>> On 28.09.24 10:34, David Hildenbrand wrote: >>>> On 28.09.24 06:55, Matthew Wilcox wrote: >>>>> On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote: >>>>>> Directly use a folio for HugeTLB and THP when calculate the next pfn, then >>>>>> remove unused head variable. >>>>> >>>>> I just noticed this got merged. You're going to hit BUG_ON with it. >>>>> >>>>>> - if (PageHuge(page)) { >>>>>> - pfn = page_to_pfn(head) + compound_nr(head) - 1; >>>>>> - isolate_hugetlb(folio, &source); >>>>>> - continue; >>>>>> - } else if (PageTransHuge(page)) >>>>>> - pfn = page_to_pfn(head) + thp_nr_pages(page) - 1; >>>>>> + /* >>>>>> + * No reference or lock is held on the folio, so it might >>>>>> + * be modified concurrently (e.g. split). As such, >>>>>> + * folio_nr_pages() may read garbage. This is fine as the outer >>>>>> + * loop will revisit the split folio later. >>>>>> + */ >>>>>> + if (folio_test_large(folio)) { >>>>> >>>>> But it's not fine. Look at the implementation of folio_test_large(): >>>>> >>>>> static inline bool folio_test_large(const struct folio *folio) >>>>> { >>>>> return folio_test_head(folio); >>>>> } >>>>> >>>>> That's going to be provided by: >>>>> >>>>> #define FOLIO_TEST_FLAG(name, page) \ >>>>> static __always_inline bool folio_test_##name(const struct folio *folio) \ >>>>> { return test_bit(PG_##name, const_folio_flags(folio, page)); } >>>>> >>>>> and here's the BUG: >>>>> >>>>> static const unsigned long *const_folio_flags(const struct folio *folio, >>>>> unsigned n) >>>>> { >>>>> const struct page *page = &folio->page; >>>>> >>>>> VM_BUG_ON_PGFLAGS(PageTail(page), page); >>>>> VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page); >>>>> return &page[n].flags; >>>>> } >>>>> >>>>> (this page can be transformed from a head page to a tail page because, >>>>> as the comment notes, we don't hold a reference. >>>>> >>>>> Please back this out. >>>> >>>> Should we generalize the approach in dump_folio() to locally copy a >>>> folio, so we can safely perform checks before deciding whether we want >>>> to try grabbing a reference on the real folio (if it's still a folio :) )? >>>> >>> >>> Oh, and I forgot: isn't the existing code already racy? >>> >>> PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page); > > Yes, in v1[1], I asked same question for existing code for PageTransHuge(page), > > "If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here, > but it seems that we don't guarantee the page won't be a tail page." > > > we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable? > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > page = pfn_to_page(pfn); > folio = page_folio(page); > > - /* > - * No reference or lock is held on the folio, so it might > - * be modified concurrently (e.g. split). As such, > - * folio_nr_pages() may read garbage. This is fine as the outer > - * loop will revisit the split folio later. > - */ > - if (folio_test_large(folio)) > - pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > - > /* > * HWPoison pages have elevated reference counts so the migration would > * fail on them. It also doesn't make any sense to migrate them in the > @@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > folio_isolate_lru(folio); > if (folio_mapped(folio)) > unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK); > + if (folio_test_large(folio)) > + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > continue; > } > > @@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) > dump_page(page, "isolation failed"); > } > } > + > + if (folio_test_large(folio)) > + pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; > put_folio: > folio_put(folio); > } > > >> >> do_migrate_range is called after start_isolate_page_range(). So a page might not be able to >> transform from a head page to a tail page as it's isolated? > start_isolate_page_range() is only isolate free pages, so maybe irrelevant. A page transform from a head page to a tail page should through the below steps: 1. The compound page is freed into buddy. 2. It's merged into larger order in buddy. 3. It's allocated as a larger order compound page. Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something? Thanks.