On 23.01.23 21:37, Matthew Wilcox wrote:
On Mon, Jan 23, 2023 at 12:23:46PM -0800, Sidhartha Kumar wrote:
@@ -1637,14 +1637,13 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
continue;
page = pfn_to_page(pfn);
folio = page_folio(page);
- head = &folio->page;
- if (PageHuge(page)) {
- pfn = page_to_pfn(head) + compound_nr(head) - 1;
+ if (folio_test_hugetlb(folio)) {
+ pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
isolate_hugetlb(folio, &source);
continue;
- } else if (PageTransHuge(page))
- pfn = page_to_pfn(head) + thp_nr_pages(page) - 1;
+ } else if (folio_test_transhuge(folio))
+ pfn = folio_pfn(folio) + thp_nr_pages(page) - 1;
I'm pretty sure those two lines should be...
} else if (folio_test_large(folio))
pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1;
But, erm ... we're doing this before we have a refcount on the page,
right? So this is unsafe because the page might change which folio
it is in. And the folio we found earlier might become a tail page
of a different folio. (As the comment below explains, HWPoison pages
won't, so it's not unsafe for them).
Also, thp_nr_pages(page) is going to return 1 for tail pages. So this
is a noop, unless page is a head page.
It's all a bit confusing, and being memory-hotplug, it's not well
tested. More thought needed.
Ehm, it is fairly well tested ;)
As memory offlining keeps retrying, temporarily making wrong assumptions
about a folio is acceptable, as long as we don't run into BUGs.
It's certainly worth a big comment in a code, that this is all racy and
that page migration code will stabilize.
Now, we could temporarily take a reference, but ... common migration
code will try taking its own ref to stabilize the page and would be
confused about yet another ref (-> migration will fail).
So we have to be careful about grabbing references on these pages, and
how long we're going to hold them. Otherwise we'll break memory
offlining completely :)
--
Thanks,
David / dhildenb