On Wed, Mar 9, 2022 at 10:24 PM Miaohe Lin <linmiaohe@xxxxxxxxxx> wrote: > > On 2022/3/10 8:30, Yang Shi wrote: > > On Wed, Mar 9, 2022 at 4:01 PM HORIGUCHI NAOYA(堀口 直也) > > <naoya.horiguchi@xxxxxxx> wrote: > >> > >> On Wed, Mar 09, 2022 at 01:55:30PM -0800, Yang Shi wrote: > >>> On Wed, Mar 9, 2022 at 1:15 AM Naoya Horiguchi > >>> <naoya.horiguchi@xxxxxxxxx> wrote: > >>>> > >>>> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > >>>> > >>>> There is a race condition between memory_failure_hugetlb() and hugetlb > >>>> free/demotion, which causes setting PageHWPoison flag on the wrong page > >>>> (which was a hugetlb when memory_failrue() was called, but was removed > >>>> or demoted when memory_failure_hugetlb() is called). This results in > >>>> killing wrong processes. So set PageHWPoison flag with holding page lock, > >>>> > >>>> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx> > >>>> --- > >>>> mm/memory-failure.c | 27 ++++++++++++--------------- > >>>> 1 file changed, 12 insertions(+), 15 deletions(-) > >>>> > >>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c > >>>> index ac6492e36978..fe25eee8f9d6 100644 > >>>> --- a/mm/memory-failure.c > >>>> +++ b/mm/memory-failure.c > >>>> @@ -1494,24 +1494,11 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > >>>> int res; > >>>> unsigned long page_flags; > >>>> > >>>> - if (TestSetPageHWPoison(head)) { > >>>> - pr_err("Memory failure: %#lx: already hardware poisoned\n", > >>>> - pfn); > >>>> - res = -EHWPOISON; > >>>> - if (flags & MF_ACTION_REQUIRED) > >>>> - res = kill_accessing_process(current, page_to_pfn(head), flags); > >>>> - return res; > >>>> - } > >>>> - > >>>> - num_poisoned_pages_inc(); > >>>> - > >>>> if (!(flags & MF_COUNT_INCREASED)) { > >>>> res = get_hwpoison_page(p, flags); > >>> > >>> I'm not an expert of hugetlb, I may be wrong. I'm wondering how this > >>> could solve the race? Is the below race still possible? > >>> > >>> __get_hwpoison_page() > >>> head = compound_head(page) > >>> > >>> hugetlb demotion (1G --> 2M) > >>> get_hwpoison_huge_page(head, &hugetlb); > >> > >> Thanks for the comment. > >> I assume Miaohe's patch below introduces additional check to detect the > >> race. The patch calls compound_head() for the raw error page again, so > >> the demotion case should be detected. I'll make the dependency clear in > >> the commit log. > >> > >> https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@xxxxxxxxxx/ > >> > >>> > >>> > >>> Then the head may point to a 2M page, but the hwpoisoned subpage is > >>> not in that 2M range? > >>> > >>> > >>>> if (!res) { > >>>> lock_page(head); > >>>> if (hwpoison_filter(p)) { > >>>> - if (TestClearPageHWPoison(head)) > >>>> - num_poisoned_pages_dec(); > >>>> unlock_page(head); > >>>> return -EOPNOTSUPP; > >>>> } > >>>> @@ -1544,13 +1531,16 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > >>>> page_flags = head->flags; > >>>> > >>>> if (hwpoison_filter(p)) { > >>>> - if (TestClearPageHWPoison(head)) > >>>> - num_poisoned_pages_dec(); > >>>> put_page(p); > >>>> res = -EOPNOTSUPP; > >>>> goto out; > >>>> } > >>>> > >>>> + if (TestSetPageHWPoison(head)) > >>> > >>> And I don't think "head" is still the head you expected if the race > >>> happened. I think we need to re-retrieve the head once the page > >>> refcount is bumped and locked. > >> > >> I think the above justification works for this. > >> When the kernel reaches this line, the hugepage is properly pinned without being > >> freed or demoted, so "head" is still pointing to the same head page as expected. > > > > I think Mike's comment in the earlier email works for this too. The > > huge page may get demoted before the page is pinned and locked, so the > > actual hwpoisoned subpage may belong to another smaller huge page now. > > > > I thinks Naoya assumes that there is a check before we use "head": > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 5444a8ef4867..0d7c58340a98 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1534,6 +1534,17 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > } > > lock_page(head); > + > + /** > + * The page could have changed compound pages due to race window. > + * If this happens just bail out. > + */ > + if (!PageHuge(p) || compound_head(p) != head) { > + action_result(pfn, MF_MSG_DIFFERENT_COMPOUND, MF_IGNORED); > + res = -EBUSY; > + goto out; > + } > + > page_flags = head->flags; > > if (hwpoison_filter(p)) { > -- > from: https://lore.kernel.org/linux-mm/20220228140245.24552-2-linmiaohe@xxxxxxxxxx/ Aha, thanks, I missed that. Yeah, we definitely need to revalidate the page. > > Thanks. > > > > >> > >> Thanks, > >> Naoya Horiguchi > >> > >>> > >>>> + goto already_hwpoisoned; > >>>> + > >>>> + num_poisoned_pages_inc(); > >>>> + > >>>> /* > >>>> * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so > >>>> * simply disable it. In order to make it work properly, we need > >>>> @@ -1576,6 +1566,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > >>>> out: > >>>> unlock_page(head); > >>>> return res; > >>>> +already_hwpoisoned: > >>>> + unlock_page(head); > >>>> + pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); > >>>> + res = -EHWPOISON; > >>>> + if (flags & MF_ACTION_REQUIRED) > >>>> + res = kill_accessing_process(current, page_to_pfn(head), flags); > >>>> + return res; > >>>> } > >>>> > >>>> static int memory_failure_dev_pagemap(unsigned long pfn, int flags, > >>>> -- > >>>> 2.25.1 > >>>> > > . > > >