答复: [PATCH] mm/hwpoison: In-use hugepage need filter judgement and filter return value should be consistent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>IIRC, hwpoison_filter() is expected to ignore error events on
>irrelevant target (page|device|cgroup) for testers to focus on
>non-filtered events.  So filtered events should not be considered
>as error-handling failures (otherwise testers can't easily
>distinguish them from real error-handling failures).
>
>But yes, the MCE handler depends on the return value of memory_failure(),
>so filtered events shouldn't be considered as error-handling's successes.
>So one possible approach is that we introduce a new mf_flags to show that
>memory_failure() is called from MCE handler (not from hwpoison injector),
>and we switch return code when hwpoison_filter failed by this flag.
>Could you try it if this approach works for you?  Or if you have any better
>idea, feel free to share it.


Since hwpoison_filter() is expected to ignore error events on irrelevant taregt,

introduce a new mf_flags to differentiate between mce handler and hwpoison

injector is a good way, this can avoid the impact of hwpoison_filter() on the mce

handler. I will resubmit a patch


发件人: HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx>
发送时间: 2022年2月16日 9:44:13
收件人: 罗飞
抄送: akpm@xxxxxxxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
主题: Re: [PATCH] mm/hwpoison: In-use hugepage need filter judgement and filter return value should be consistent
 
On Thu, Feb 10, 2022 at 03:58:46AM -0500, luofei wrote:
> After successfully obtaining the reference count of the huge
> page, it is still necessary to call hwpoison_filter() to make a
> filter judgement, otherwise the filter page will be unmaped and
> the related process may be killed.
>
> Also when the huge page meets the filter conditions, it should
> not be regarded as successful memory_failure() processing, but
> should return an error code to inform the caller, otherwise the
> caller regards the error page has been identified and isolated,
> which may lead to calling set_mce_nospec() to change page attribute.

Hi luofei,

I agree with inserting another hwpoison_filter() block which was
simply missed. Thanks for pointing it out.

>
> Meanwhile, when calling hwpoison_filter() to determine that a page
> needs to be filtered, it is better to keep the same EBUSY return
> value, so that the caller can recognize that memory_failure() did
> not successfully process the error page.

IIRC, hwpoison_filter() is expected to ignore error events on
irrelevant target (page|device|cgroup) for testers to focus on
non-filtered events.  So filtered events should not be considered
as error-handling failures (otherwise testers can't easily
distinguish them from real error-handling failures).

But yes, the MCE handler depends on the return value of memory_failure(),
so filtered events shouldn't be considered as error-handling's successes.
So one possible approach is that we introduce a new mf_flags to show that
memory_failure() is called from MCE handler (not from hwpoison injector),
and we switch return code when hwpoison_filter failed by this flag.
Could you try it if this approach works for you?  Or if you have any better
idea, feel free to share it.

Thanks,
Naoya Horiguchi

>
> Signed-off-by: luofei <luofei@xxxxxxxxxxxx>
> ---
>  mm/hwpoison-inject.c |  2 +-
>  mm/memory-failure.c  | 12 ++++++++++--
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hwpoison-inject.c b/mm/hwpoison-inject.c
> index aff4d27ec235..4ecd77cd1ded 100644
> --- a/mm/hwpoison-inject.c
> +++ b/mm/hwpoison-inject.c
> @@ -44,7 +44,7 @@ static int hwpoison_inject(void *data, u64 val)
>         */
>        err = hwpoison_filter(hpage);
>        if (err)
> -             return 0;
> +             return -EBUSY;

>  inject:
>        pr_info("Injecting memory failure at pfn %#lx\n", pfn);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 97a9ed8f87a9..acf97fb2659a 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1526,7 +1526,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>                                if (TestClearPageHWPoison(head))
>                                        num_poisoned_pages_dec();
>                                unlock_page(head);
> -                             return 0;
> +                             return -EBUSY;
>                        }
>                        unlock_page(head);
>                        res = MF_FAILED;
> @@ -1545,6 +1545,13 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags)
>        lock_page(head);
>        page_flags = head->flags;

> +     if (hwpoison_filter(p)) {
> +             if (TestClearPageHWPoison(head))
> +                     num_poisoned_pages_dec();
> +             unlock_page(head);
> +             res = -EBUSY
> +             goto out;
> +     }
>        /*
>         * TODO: hwpoison for pud-sized hugetlb doesn't work right now, so
>         * simply disable it. In order to make it work properly, we need
> @@ -1613,7 +1620,7 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>                goto out;

>        if (hwpoison_filter(page)) {
> -             rc = 0;
> +             rc = -EBUSY;
>                goto unlock;
>        }

> @@ -1837,6 +1844,7 @@ int memory_failure(unsigned long pfn, int flags)
>                        num_poisoned_pages_dec();
>                unlock_page(p);
>                put_page(p);
> +             res = -EBUSY;
>                goto unlock_mutex;
>        }

> --
> 2.27.0

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux