On Fri, Apr 15, 2022 at 09:55:14AM +0800, Miaohe Lin wrote: > On 2022/4/15 1:56, Mike Kravetz wrote: > > On 4/8/22 06:53, Naoya Horiguchi wrote: > >> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > >> > >> There is a race condition between memory_failure_hugetlb() and hugetlb > >> free/demotion, which causes setting PageHWPoison flag on the wrong page. > >> The one simple result is that wrong processes can be killed, but another > >> (more serious) one is that the actual error is left unhandled, so no one > >> prevents later access to it, and that might lead to more serious results > >> like consuming corrupted data. > >> > >> Think about the below race window: > >> > >> CPU 1 CPU 2 > >> memory_failure_hugetlb > >> struct page *head = compound_head(p); > >> hugetlb page might be freed to > >> buddy, or even changed to another > >> compound page. > >> > >> get_hwpoison_page -- page is not what we want now... > >> > >> The current code first does prechecks roughly and then reconfirms > >> after taking refcount, but it's found that it makes code overly > >> complicated, so move the prechecks in a single hugetlb_lock range. > >> > >> A newly introduced function, try_memory_failure_hugetlb(), always > >> takes hugetlb_lock (even for non-hugetlb pages). That can be > >> improved, but memory_failure() is rare in principle, so should > >> not be a big problem. > ... > > > > The above code works as designed, but may be a bit confusing. If HPageFreed() > > we KNOW ref count is zero, so no need to even call get_page_unless_zero() as > > it will always return false in this case. It might be more clear if written > > as separate else if statements such as: > > > > } else if (HPageFreed(head)) { > > ret = 0; > > } else if (HPageMigratable(head)) { > > ret = get_page_unless_zero(head); > > if (ret) > > count_increased = true; > > > > This code here is consistent with the logic in get_hwpoison_huge_page. If change is required, > they might need to be changed together. > > BTW: They look a bit confusing for me at first but I get used to it later. ;) Thank you for comments, Mike, Miaohe. Patch 1/3 is to stable, so I'll submit a separate cleanup patch for these changes. > > Thanks! > > > Not insisting this be changed. Just easier to understand IMO. > > > > Again, thanks for your work on this! > > > > Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Thank you! - Naoya Horiguchi