On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote: > On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi > <naoya.horiguchi@xxxxxxxxx> wrote: > > > > From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > > > There is a race condition between memory_failure_hugetlb() and hugetlb > > free/demotion, which causes setting PageHWPoison flag on the wrong page > > (which was a hugetlb when memory_failrue() was called, but was removed > > or demoted when memory_failure_hugetlb() is called). This results in > > killing wrong processes. So set PageHWPoison flag with holding page lock, > > > > Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > > --- > > mm/memory-failure.c | 27 ++++++++++++--------------- > > 1 file changed, 12 insertions(+), 15 deletions(-) > > > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index ac6492e36978..fe25eee8f9d6 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > > int res; > > unsigned long page_flags; > > > > - if (TestSetPageHWPoison(head)) { > > - pr_err("Memory failure: %#lx: already hardware poisoned\n", > > - pfn); > > - res = -EHWPOISON; > > - if (flags & MF_ACTION_REQUIRED) > > - res = kill_accessing_process(current, page_to_pfn(head), flags); > > - return res; > > - } > > - > > - num_poisoned_pages_inc(); > > - > > if (!(flags & MF_COUNT_INCREASED)) { > > res = get_hwpoison_page(p, flags); > > I'm not an expert of hugetlb, I may be wrong. I'm wondering how this > could solve the race? Is the below race still possible? > > __get_hwpoison_page() > head = compound_head(page) > > hugetlb demotion (1G --> 2M) > get_hwpoison_huge_page(head, &hugetlb); Thanks for the comment. I assume Miaohe's patch below introduces additional check to detect the race. The patch calls compound_head() for the raw error page again, so the demotion case should be detected. I'll make the dependency clear in the commit log. https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@xxxxxxxxxx/ > > > Then the head may point to a 2M page, but the hwpoisoned subpage is > not in that 2M range? > > > > if (!res) { > > lock_page(head); > > if (hwpoison_filter(p)) { > > - if (TestClearPageHWPoison(head)) > > - num_poisoned_pages_dec(); > > unlock_page(head); > > return -EOPNOTSUPP; > > } > > @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > > page_flags = head->flags; > > > > if (hwpoison_filter(p)) { > > - if (TestClearPageHWPoison(head)) > > - num_poisoned_pages_dec(); > > put_page(p); > > res = -EOPNOTSUPP; > > goto out; > > } > > > > + if (TestSetPageHWPoison(head)) > > And I don't think "head" is still the head you expected if the race > happened. I think we need to re-retrieve the head once the page > refcount is bumped and locked. I think the above justification works for this. When the kernel reaches this line, the hugepage is properly pinned without being freed or demoted, so "head" is still pointing to the same head page as expected. Thanks, Naoya Horiguchi > > > + goto already_hwpoisoned; > > + > > + num_poisoned_pages_inc(); > > + > > /* > > * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so > > * simply disable it. In order to make it work properly, we need > > @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > > out: > > unlock_page(head); > > return res; > > +already_hwpoisoned: > > + unlock_page(head); > > + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); > > + res = -EHWPOISON; > > + if (flags & MF_ACTION_REQUIRED) > > + res = kill_accessing_process(current, page_to_pfn(head), flags); > > + return res; > > } > > > > static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > > -- > > 2.25.1 > >