On 2024/8/7 19:14, David Hildenbrand wrote: > On 07.08.24 09:39, Miaohe Lin wrote: >> On 2024/8/6 17:29, David Hildenbrand wrote: >>> On 02.08.24 09:50, Kefeng Wang wrote: >>>> >>>> >>>> On 2024/8/2 4:10, David Hildenbrand wrote: >>>>> >>>>>>> >>>>>>> We're not checking the head page here, will this work reliably for >>>>>>> hugetlb? (I recall some difference in per-page hwpoison handling between >>>>>>> hugetlb and THP due to the vmemmap optimization) >>>>>> >>>>>> Before this changes, the hwposioned hugetlb page won't try to unmap in >>>>>> do_migrate_range(), we hope it already unmapped in memory_failure(), as >>>>>> mentioned from comments, there maybe fail to unmap, so a new safeguard >>>>>> to try to unmap it again here, but we don't need to guarantee it. >>>>> >>>>> Thanks for clarifying! >>>>> >>>>> But I do wonder if the PageHWPoison() is the right thing to do for hugetlb. >>>>> >>>>> IIUC, hugetlb requires folio_test_hwpoison() -- testing the head page >>>>> not the subpage. Reason is that due to the vmemmap optimization we might >>>>> not be able to modify subpages to set hwpoison. >>>> >>>> Yes, HVO is special(only head page with hwpoison), since we always want >>>> to check head page here (next pfn = head_pfn + nr), so it might be >>>> enough to only use PageHWpoison, but just in case, adding hwpoison check >>>> for the head page, >>>> >>>> if (unlikely(PageHWPoison(page) || folio_test_hwpoison(folio))) >>> >>> I also do wonder if we have to check for large folios folio_test_has_hwpoison(): >>> if any subpage is poisoned, not just the current page. >>> >> >> IMHO, below if condition [1] should be fine to check for any hwpoisoned folio: >> >> if (folio_test_hwpoison(folio) || >> (folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) { >> >> 1. For raw pages, folio_test_hwpoison(folio) works fine. >> 2. For thp (memory_failure fails to split it in first place), folio_test_has_hwpoisoned(folio) works fine. >> 3. For hugetlb, we always have hwpoison flag set for folio. So folio_test_hwpoison(folio) works fine. It seems I missed one corner case. When memory_failure meets an isolated thp, get_hwpoison_page() will return EIO and thp won't have has_hwpoison flag set. Above pattern might not work with it. :( >> >> But folio might not be the right hwpoisoned page, i.e. subpages might be hwpoisoned instead. >> Or am I miss something? > > Yes, but we can only migrate full folios, and if any subpage is poisoned we're in trouble and have to effectively force-unmap it? Yes, I agree with you. > > At least that's my understanding :) Thanks. .