On 2022/4/15 1:56, Mike Kravetz wrote: > On 4/8/22 06:53, Naoya Horiguchi wrote: >> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> >> >> There is a race condition between memory_failure_hugetlb() and hugetlb >> free/demotion, which causes setting PageHWPoison flag on the wrong page. >> The one simple result is that wrong processes can be killed, but another >> (more serious) one is that the actual error is left unhandled, so no one >> prevents later access to it, and that might lead to more serious results >> like consuming corrupted data. >> >> Think about the below race window: >> >> CPU 1 CPU 2 >> memory_failure_hugetlb >> struct page *head = compound_head(p); >> hugetlb page might be freed to >> buddy, or even changed to another >> compound page. >> >> get_hwpoison_page -- page is not what we want now... >> >> The current code first does prechecks roughly and then reconfirms >> after taking refcount, but it's found that it makes code overly >> complicated, so move the prechecks in a single hugetlb_lock range. >> >> A newly introduced function, try_memory_failure_hugetlb(), always >> takes hugetlb_lock (even for non-hugetlb pages). That can be >> improved, but memory_failure() is rare in principle, so should >> not be a big problem. ... > > The above code works as designed, but may be a bit confusing. If HPageFreed() > we KNOW ref count is zero, so no need to even call get_page_unless_zero() as > it will always return false in this case. It might be more clear if written > as separate else if statements such as: > > } else if (HPageFreed(head)) { > ret = 0; > } else if (HPageMigratable(head)) { > ret = get_page_unless_zero(head); > if (ret) > count_increased = true; > This code here is consistent with the logic in get_hwpoison_huge_page. If change is required, they might need to be changed together. BTW: They look a bit confusing for me at first but I get used to it later. ;) Thanks! > Not insisting this be changed. Just easier to understand IMO. > > Again, thanks for your work on this! > > Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >