On 3/17/22 22:16, Naoya Horiguchi wrote: > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > There is a race condition between memory_failure_hugetlb() and hugetlb > free/demotion, which causes setting PageHWPoison flag on the wrong page. > The one simple result is that wrong processes can be killed, but another > (more serious) one is that the actual error is left unhandled, so no one > prevents later access to it, and that might lead to more serious results > like consuming corrupted data. > > Think about the below race window: > > CPU 1 CPU 2 > memory_failure_hugetlb > struct page *head = compound_head(p); > hugetlb page might be freed to > buddy, or even changed to another > compound page. > > get_hwpoison_page -- page is not what we want now... > > The compound_head is called outside hugetlb_lock, so the head is not > reliable. > > So set PageHWPoison flag after passing prechecks. And to detect > potential violation, this patch also introduces a new action type > MF_MSG_DIFFERENT_PAGE_SIZE. > > Reported-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > Signed-off-by: Miaohe Lin <linmiaohe@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > --- > ChangeLog v4 -> v5: > - call TestSetPageHWPoison() when page_handle_poison() fails. > - call TestSetPageHWPoison() for unhandlable cases (MF_MSG_UNKNOWN and > MF_MSG_DIFFERENT_PAGE_SIZE). > - Set PageHWPoison on the head page only when the error page is surely > a hugepage, otherwise set the flag on the raw page. > - rebased onto v5.17-rc8-mmotm-2022-03-16-17-42 Thanks for all the updates! I can not find any issues with these changes. However, this code is very difficult to follow/understand. Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz