Re: + mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 21, 2022 at 07:48:20PM -0800, Andrew Morton wrote:
> The patch titled
>      Subject: mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
> has been added to the -mm tree.  Its filename is
>      mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler.patch

Andrew, please drop this patch from your queue. Review below:

> From: luofei <luofei@xxxxxxxxxxxx>
> Subject: mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler
> 
> When the hwpoison page meets the filter conditions, it should not be
> regarded as successful memory_failure() processing for mce handler, but
> should return a value(-EHWPOISON), otherwise mce handler regards the error
> page has been identified and isolated, which may lead to calling
> set_mce_nospec() to change page attribute, etc.
> 
> Here a new MF_MCE_HANDLE flag is introduced to identify the call from the
> mce handler and instruct hwpoison_filter() to return -EHWPOISON, otherwise
> return 0 for compatibility with the hwpoison injector.
> 
> Link: https://lkml.kernel.org/r/20220221021415.2328992-1-luofei@xxxxxxxxxxxx
> Signed-off-by: luofei <luofei@xxxxxxxxxxxx>
> Cc: Tony Luck <tony.luck@xxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Fei Luo <luofei@xxxxxxxxxxxx>
> Cc: Miaohe Lin <linmiaohe@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
> 
>  arch/x86/kernel/cpu/mce/core.c |   15 +++++++++------
>  include/linux/mm.h             |    1 +
>  mm/memory-failure.c            |   14 ++++++++++++--
>  3 files changed, 22 insertions(+), 8 deletions(-)
> 
> --- a/arch/x86/kernel/cpu/mce/core.c~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
> +++ a/arch/x86/kernel/cpu/mce/core.c
> @@ -612,7 +612,7 @@ static int uc_decode_notifier(struct not
>  		return NOTIFY_DONE;
>  
>  	pfn = mce->addr >> PAGE_SHIFT;
> -	if (!memory_failure(pfn, 0)) {
> +	if (!memory_failure(pfn, MF_MCE_HANDLE)) {
>  		set_mce_nospec(pfn, whole_page(mce));
>  		mce->kflags |= MCE_HANDLED_UC;
>  	}
> @@ -1286,7 +1286,7 @@ static void kill_me_now(struct callback_
>  static void kill_me_maybe(struct callback_head *cb)
>  {
>  	struct task_struct *p = container_of(cb, struct task_struct, mce_kill_me);
> -	int flags = MF_ACTION_REQUIRED;
> +	int flags = MF_ACTION_REQUIRED | MF_MCE_HANDLE;
>  	int ret;
>  
>  	p->mce_count = 0;
> @@ -1303,9 +1303,12 @@ static void kill_me_maybe(struct callbac
>  	}
>  
>  	/*
> -	 * -EHWPOISON from memory_failure() means that it already sent SIGBUS
> -	 * to the current process with the proper error info, so no need to
> -	 * send SIGBUS here again.
> +	 * -EHWPOISON from memory_failure() means that memory_failure() did
> +	 * not handle the error event for the following reason:
> +	 *  - SIGBUS has already been sent to the current process with the
> +	 *    proper error info, or
> +	 *  - hwpoison_filter() filtered the event,
> +	 * so no need to deal with it more.

So you're overloading what we do for -EHWPOISON because it is just fits?

The right way to do this is to return a *distinct* error value which
means "ignore this error" and the MCE code can then ignore it.

> --- a/include/linux/mm.h~mm-hwpoison-avoid-the-impact-of-hwpoison_filter-return-value-on-mce-handler
> +++ a/include/linux/mm.h
> @@ -3173,6 +3173,7 @@ enum mf_flags {
>  	MF_MUST_KILL = 1 << 2,
>  	MF_SOFT_OFFLINE = 1 << 3,
>  	MF_UNPOISON = 1 << 4,
> +	MF_MCE_HANDLE = 1 << 5,

This thing is too x86-specific and means nothing for other arches which
use memory_failure().

And I don't like this jumping through hoops one bit:

MCE code sets MF_MCE_HANDLE to tell the memory failure code to return a
special error which it does and then the MCE code again looks looks at
that special error. That's just nuts.

As said above, all you wanna do is have memory_failure() return a
distinct error value which says "ignore this error and don't do any
further processing". Without a new MF flag.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux