On Mon, Mar 21, 2022 at 05:46:48PM -0700, Yang Shi wrote: > On Thu, Mar 17, 2022 at 10:16 PM Naoya Horiguchi > <naoya.horiguchi@xxxxxxxxx> wrote: > > > > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > > > There is a race condition between memory_failure_hugetlb() and hugetlb > > free/demotion, which causes setting PageHWPoison flag on the wrong page. > > The one simple result is that wrong processes can be killed, but another > > (more serious) one is that the actual error is left unhandled, so no one > > prevents later access to it, and that might lead to more serious results > > like consuming corrupted data. > > > > Think about the below race window: > > > > CPU 1 CPU 2 > > memory_failure_hugetlb > > struct page *head = compound_head(p); > > hugetlb page might be freed to > > buddy, or even changed to another > > compound page. > > > > get_hwpoison_page -- page is not what we want now... > > > > The compound_head is called outside hugetlb_lock, so the head is not > > reliable. > > > > So set PageHWPoison flag after passing prechecks. And to detect > > potential violation, this patch also introduces a new action type > > MF_MSG_DIFFERENT_PAGE_SIZE. > > > > Reported-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> ... > > @@ -1547,21 +1545,31 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > > * If this happens just bail out. > > */ > > if (!PageHuge(p) || compound_head(p) != head) { > > + if (TestSetPageHWPoison(p)) > > + already_hwpoisoned = pfn; > > + else > > + num_poisoned_pages_inc(); > > action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED); > > The commit log says "this patch also introduces a new action type > MF_MSG_DIFFERENT_PAGE_SIZE", but it is not defined in the patch and it > is called here. Did I miss something? Sorry, you're right. MF_MSG_DIFFERENT_PAGE_SIZE is defined in the separate patch in mmotm, and disappeared when rebasing (not intended). I think of rebasing this to mainline again to apply cleanly to -stable, expecting it to applied before other recent hwpoison patches. Thanks, Naoya Horiguchi