On 29.09.24 04:19, Miaohe Lin wrote:
On 2024/9/29 10:04, Kefeng Wang wrote:
On 2024/9/29 9:16, Miaohe Lin wrote:
On 2024/9/28 16:39, David Hildenbrand wrote:
On 28.09.24 10:34, David Hildenbrand wrote:
On 28.09.24 06:55, Matthew Wilcox wrote:
On Tue, Aug 27, 2024 at 07:47:24PM +0800, Kefeng Wang wrote:
Directly use a folio for HugeTLB and THP when calculate the next pfn, then
remove unused head variable.
I just noticed this got merged. You're going to hit BUG_ON with it.
- if (PageHuge(page)) {
- pfn = page_to_pfn(head) + compound_nr(head) - 1;
- isolate_hugetlb(folio, &source);
- continue;
- } else if (PageTransHuge(page))
- pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
+ /*
+ * No reference or lock is held on the folio, so it might
+ * be modified concurrently (e.g. split). As such,
+ * folio_nr_pages() may read garbage. This is fine as the outer
+ * loop will revisit the split folio later.
+ */
+ if (folio_test_large(folio)) {
But it's not fine. Look at the implementation of folio_test_large():
static inline bool folio_test_large(const struct folio *folio)
{
return folio_test_head(folio);
}
That's going to be provided by:
#define FOLIO_TEST_FLAG(name, page) \
static __always_inline bool folio_test_##name(const struct folio *folio) \
{ return test_bit(PG_##name, const_folio_flags(folio, page)); }
and here's the BUG:
static const unsigned long *const_folio_flags(const struct folio *folio,
unsigned n)
{
const struct page *page = &folio->page;
VM_BUG_ON_PGFLAGS(PageTail(page), page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
return &page[n].flags;
}
(this page can be transformed from a head page to a tail page because,
as the comment notes, we don't hold a reference.
Please back this out.
Should we generalize the approach in dump_folio() to locally copy a
folio, so we can safely perform checks before deciding whether we want
to try grabbing a reference on the real folio (if it's still a folio :) )?
Oh, and I forgot: isn't the existing code already racy?
PageTransHuge() -> VM_BUG_ON_PAGE(PageTail(page), page);
Yes, in v1[1], I asked same question for existing code for PageTransHuge(page),
"If the page is a tail page, we will BUG_ON(DEBUG_VM enabled) here,
but it seems that we don't guarantee the page won't be a tail page."
we could delay the calculation after we got a ref, but the traversal of pfn may slow down a little if hint a tail pfn, is it acceptable?
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1786,15 +1786,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
page = pfn_to_page(pfn);
folio = page_folio(page);
- /*
- * No reference or lock is held on the folio, so it might
- * be modified concurrently (e.g. split). As such,
- * folio_nr_pages() may read garbage. This is fine as the outer
- * loop will revisit the split folio later.
- */
- if (folio_test_large(folio))
- pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
-
/*
* HWPoison pages have elevated reference counts so the migration would
* fail on them. It also doesn't make any sense to migrate them in the
@@ -1807,6 +1798,8 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
folio_isolate_lru(folio);
if (folio_mapped(folio))
unmap_poisoned_folio(folio, TTU_IGNORE_MLOCK);
+ if (folio_test_large(folio))
+ pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
continue;
}
@@ -1823,6 +1816,9 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
dump_page(page, "isolation failed");
}
}
+
+ if (folio_test_large(folio))
+ pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
put_folio:
folio_put(folio);
}
do_migrate_range is called after start_isolate_page_range(). So a page might not be able to
transform from a head page to a tail page as it's isolated?
start_isolate_page_range() is only isolate free pages, so maybe irrelevant.
A page transform from a head page to a tail page should through the below steps:
1. The compound page is freed into buddy.
2. It's merged into larger order in buddy.
3. It's allocated as a larger order compound page.
Since it is isolated, I think step 2 or 3 cannot happen. Or am I miss something?
By isolated, you mean that the pageblock is isolated, and all free pages
are in the MIGRATE_ISOLATE buddy list. Nice observation.
Indeed, a tail page could become a head page (concurrent split is
possible), but a head page should not become a tail for the reason you
mention.
Even mm/page_reporting.c will skip isolated pageblocks.
I wonder if there are some corner cases, but nothing comes to mind that
would perform compound allocations from the MIGRATE_ISOLATE list.
--
Cheers,
David / dhildenb