Re: [PATCH] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on action required events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 27, 2022 at 6:25 AM Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> wrote:
>
> There are two major types of uncorrected error (UC) :
>
> - Action Required: The error is detected and the processor already consumes the
>   memory. OS requires to take action (for example, offline failure page/kill
>   failure thread) to recover this uncorrectable error.
>
> - Action Optional: The error is detected out of processor execution context.
>   Some data in the memory are corrupted. But the data have not been consumed.
>   OS is optional to take action to recover this uncorrectable error.
>
> For X86 platforms, we can easily distinguish between these two types
> based on the MCA Bank. While for arm64 platform, the memory failure
> flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0,
> a.k.a, Action Optional now.
>
> If UC is detected by a background scrubber, it is obviously an Action
> Optional error.  For other errors, we should conservatively regard them
> as Action Required.
>
> cper_sec_mem_err::error_type identifies the type of error that occurred
> if CPER_MEM_VALID_ERROR_TYPE is set. So, set memory failure flags as 0
> for Scrub Uncorrected Error (type 14). Otherwise, set memory failure
> flags as MF_ACTION_REQUIRED.
>
> Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>

I need input from the APEI reviewers on this.

Thanks!

> ---
>  drivers/acpi/apei/ghes.c | 10 ++++++++--
>  include/linux/cper.h     |  3 +++
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 80ad530583c9..6c03059cbfc6 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -474,8 +474,14 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>         if (sec_sev == GHES_SEV_CORRECTED &&
>             (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
>                 flags = MF_SOFT_OFFLINE;
> -       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
> -               flags = 0;
> +       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) {
> +               if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
> +                       flags = mem_err->error_type == CPER_MEM_SCRUB_UC ?
> +                                       0 :
> +                                       MF_ACTION_REQUIRED;
> +               else
> +                       flags = MF_ACTION_REQUIRED;
> +       }
>
>         if (flags != -1)
>                 return ghes_do_memory_failure(mem_err->physical_addr, flags);
> diff --git a/include/linux/cper.h b/include/linux/cper.h
> index eacb7dd7b3af..b77ab7636614 100644
> --- a/include/linux/cper.h
> +++ b/include/linux/cper.h
> @@ -235,6 +235,9 @@ enum {
>  #define CPER_MEM_VALID_BANK_ADDRESS            0x100000
>  #define CPER_MEM_VALID_CHIP_ID                 0x200000
>
> +#define CPER_MEM_SCRUB_CE                      13
> +#define CPER_MEM_SCRUB_UC                      14
> +
>  #define CPER_MEM_EXT_ROW_MASK                  0x3
>  #define CPER_MEM_EXT_ROW_SHIFT                 16
>
> --
> 2.20.1.9.gb50a0d7
>



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux