Re: [PATCH] ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on action required events

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2022/10/29 AM1:08, Rafael J. Wysocki 写道:
> On Thu, Oct 27, 2022 at 6:25 AM Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> wrote:
>>
>> There are two major types of uncorrected error (UC) :
>>
>> - Action Required: The error is detected and the processor already consumes the
>>   memory. OS requires to take action (for example, offline failure page/kill
>>   failure thread) to recover this uncorrectable error.
>>
>> - Action Optional: The error is detected out of processor execution context.
>>   Some data in the memory are corrupted. But the data have not been consumed.
>>   OS is optional to take action to recover this uncorrectable error.
>>
>> For X86 platforms, we can easily distinguish between these two types
>> based on the MCA Bank. While for arm64 platform, the memory failure
>> flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0,
>> a.k.a, Action Optional now.
>>
>> If UC is detected by a background scrubber, it is obviously an Action
>> Optional error.  For other errors, we should conservatively regard them
>> as Action Required.
>>
>> cper_sec_mem_err::error_type identifies the type of error that occurred
>> if CPER_MEM_VALID_ERROR_TYPE is set. So, set memory failure flags as 0
>> for Scrub Uncorrected Error (type 14). Otherwise, set memory failure
>> flags as MF_ACTION_REQUIRED.
>>
>> Signed-off-by: Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx>
> 
> I need input from the APEI reviewers on this.
> 
> Thanks!

Hi, Rafael,

Sorry, I missed this email. Thank you for you quick reply. Let's discuss with
reviewers.

Thank you.

Cheers,
Shuai


> 
>> ---
>>  drivers/acpi/apei/ghes.c | 10 ++++++++--
>>  include/linux/cper.h     |  3 +++
>>  2 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 80ad530583c9..6c03059cbfc6 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -474,8 +474,14 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>>         if (sec_sev == GHES_SEV_CORRECTED &&
>>             (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
>>                 flags = MF_SOFT_OFFLINE;
>> -       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>> -               flags = 0;
>> +       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) {
>> +               if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
>> +                       flags = mem_err->error_type == CPER_MEM_SCRUB_UC ?
>> +                                       0 :
>> +                                       MF_ACTION_REQUIRED;
>> +               else
>> +                       flags = MF_ACTION_REQUIRED;
>> +       }
>>
>>         if (flags != -1)
>>                 return ghes_do_memory_failure(mem_err->physical_addr, flags);
>> diff --git a/include/linux/cper.h b/include/linux/cper.h
>> index eacb7dd7b3af..b77ab7636614 100644
>> --- a/include/linux/cper.h
>> +++ b/include/linux/cper.h
>> @@ -235,6 +235,9 @@ enum {
>>  #define CPER_MEM_VALID_BANK_ADDRESS            0x100000
>>  #define CPER_MEM_VALID_CHIP_ID                 0x200000
>>
>> +#define CPER_MEM_SCRUB_CE                      13
>> +#define CPER_MEM_SCRUB_UC                      14
>> +
>>  #define CPER_MEM_EXT_ROW_MASK                  0x3
>>  #define CPER_MEM_EXT_ROW_SHIFT                 16
>>
>> --
>> 2.20.1.9.gb50a0d7
>>



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux