On Thu, Oct 27, 2022 at 6:25 AM Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> wrote: > > There are two major types of uncorrected error (UC) : > > - Action Required: The error is detected and the processor already consumes the > memory. OS requires to take action (for example, offline failure page/kill > failure thread) to recover this uncorrectable error. > > - Action Optional: The error is detected out of processor execution context. > Some data in the memory are corrupted. But the data have not been consumed. > OS is optional to take action to recover this uncorrectable error. > > For X86 platforms, we can easily distinguish between these two types > based on the MCA Bank. While for arm64 platform, the memory failure > flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0, > a.k.a, Action Optional now. > > If UC is detected by a background scrubber, it is obviously an Action > Optional error. For other errors, we should conservatively regard them > as Action Required. > > cper_sec_mem_err::error_type identifies the type of error that occurred > if CPER_MEM_VALID_ERROR_TYPE is set. So, set memory failure flags as 0 > for Scrub Uncorrected Error (type 14). Otherwise, set memory failure > flags as MF_ACTION_REQUIRED. > > Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> I need input from the APEI reviewers on this. Thanks! > --- > drivers/acpi/apei/ghes.c | 10 ++++++++-- > include/linux/cper.h | 3 +++ > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index 80ad530583c9..6c03059cbfc6 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -474,8 +474,14 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, > if (sec_sev == GHES_SEV_CORRECTED && > (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED)) > flags = MF_SOFT_OFFLINE; > - if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) > - flags = 0; > + if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) { > + if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE) > + flags = mem_err->error_type == CPER_MEM_SCRUB_UC ? > + 0 : > + MF_ACTION_REQUIRED; > + else > + flags = MF_ACTION_REQUIRED; > + } > > if (flags != -1) > return ghes_do_memory_failure(mem_err->physical_addr, flags); > diff --git a/include/linux/cper.h b/include/linux/cper.h > index eacb7dd7b3af..b77ab7636614 100644 > --- a/include/linux/cper.h > +++ b/include/linux/cper.h > @@ -235,6 +235,9 @@ enum { > #define CPER_MEM_VALID_BANK_ADDRESS 0x100000 > #define CPER_MEM_VALID_CHIP_ID 0x200000 > > +#define CPER_MEM_SCRUB_CE 13 > +#define CPER_MEM_SCRUB_UC 14 > + > #define CPER_MEM_EXT_ROW_MASK 0x3 > #define CPER_MEM_EXT_ROW_SHIFT 16 > > -- > 2.20.1.9.gb50a0d7 >