>IIRC, hwpoison_filter() is expected to ignore error events on
>irrelevant target (page|device|cgroup) for testers to focus on
>non-filtered events. So filtered events should not be considered
>as error-handling failures (otherwise testers can't easily
>distinguish them from real error-handling failures).
>
>But yes, the MCE handler depends on the return value of memory_failure(),
>so filtered events shouldn't be considered as error-handling's successes.
>So one possible approach is that we introduce a new mf_flags to show that
>memory_failure() is called from MCE handler (not from hwpoison injector),
>and we switch return code when hwpoison_filter failed by this flag.
>Could you try it if this approach works for you? Or if you have any better
>idea, feel free to share it.
Since hwpoison_filter() is expected to ignore error events on irrelevant taregt,
introduce
a new mf_flags to differentiate between mce handler and hwpoison
injector is a good way, this can avoid the impact of hwpoison_filter() on the mce
handler. I will resubmit a patch
发送时间: 2022年2月16日 9:44:13
收件人: 罗飞
抄送: akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
主题: Re: [PATCH] mm/hwpoison: In-use hugepage need filter judgement and filter return value should be consistent
> After successfully obtaining the reference count of the huge
> page, it is still necessary to call hwpoison_filter() to make a
> filter judgement, otherwise the filter page will be unmaped and
> the related process may be killed.
>
> Also when the huge page meets the filter conditions, it should
> not be regarded as successful memory_failure() processing, but
> should return an error code to inform the caller, otherwise the
> caller regards the error page has been identified and isolated,
> which may lead to calling set_mce_nospec() to change page attribute.
Hi luofei,
I agree with inserting another hwpoison_filter() block which was
simply missed. Thanks for pointing it out.
>
> Meanwhile, when calling hwpoison_filter() to determine that a page
> needs to be filtered, it is better to keep the same EBUSY return
> value, so that the caller can recognize that memory_failure() did
> not successfully process the error page.
IIRC, hwpoison_filter() is expected to ignore error events on
irrelevant target (page|device|cgroup) for testers to focus on
non-filtered events. So filtered events should not be considered
as error-handling failures (otherwise testers can't easily
distinguish them from real error-handling failures).
But yes, the MCE handler depends on the return value of memory_failure(),
so filtered events shouldn't be considered as error-handling's successes.
So one possible approach is that we introduce a new mf_flags to show that
memory_failure() is called from MCE handler (not from hwpoison injector),
and we switch return code when hwpoison_filter failed by this flag.
Could you try it if this approach works for you? Or if you have any better
idea, feel free to share it.
Thanks,
Naoya Horiguchi
>
> Signed-off-by: luofei <luofei@xxxxxxxxxxxx>
> ---
> mm/hwpoison-inject.c | 2 +-
> mm/memory-failure.c | 12 ++++++++++--
> 2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index aff4d27ec235..4ecd77cd1ded 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -44,7 +44,7 @@ static int hwpoison_inject(void *data, u64 val)
> */
> err = hwpoison_filter(hpage);
> if (err)
> - return 0;
> + return -EBUSY;
>
> inject:
> pr_info("Injecting memory failure at pfn %#lx\n", pfn);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 97a9ed8f87a9..acf97fb2659a 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1526,7 +1526,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> if (TestClearPageHWPoison(head))
> num_poisoned_pages_dec();
> unlock_page(head);
> - return 0;
> + return -EBUSY;
> }
> unlock_page(head);
> res = MF_FAILED;
> @@ -1545,6 +1545,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
> lock_page(head);
> page_flags = head->flags;
>
> + if (hwpoison_filter(p)) {
> + if (TestClearPageHWPoison(head))
> + num_poisoned_pages_dec();
> + unlock_page(head);
> + res = -EBUSY
> + goto out;
> + }
> /*
> * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
> * simply disable it. In order to make it work properly, we need
> @@ -1613,7 +1620,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
> goto out;
>
> if (hwpoison_filter(page)) {
> - rc = 0;
> + rc = -EBUSY;
> goto unlock;
> }
>
> @@ -1837,6 +1844,7 @@ int memory_failure(unsigned long pfn, int flags)
> num_poisoned_pages_dec();
> unlock_page(p);
> put_page(p);
> + res = -EBUSY;
> goto unlock_mutex;
> }
>
> --
> 2.27.0