Re: [PATCH 2/2] acpi: apei: handle SEI notification type for ARMv8

Xie XiuQi <xiexiuqi@xxxxxxxxxx> · Mon, 6 Mar 2017 19:06:32 +0800

Hi James,

Thanks for your comments.

On 2017/3/6 18:00, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 03/03/17 10:39, Xie XiuQi wrote:
>> ARM APEI extension proposal added SEI (asynchronous SError interrupt)
>> notification type for ARMv8.
>>
>> Add a new GHES error source handling function for SEI. In firmware
>> first mode, if an error source's notification type is SEI. Then GHES
>> could parse and report the detail error information.
> 
> This patch doesn't apply to any upstream tree. Is this based on Tyler's larger
> UEFI/ACPI update series? If so, please mention this in your cover letter, (Nit:
> please include a cover letter when sending two or more patches!).
> 

Yes, this patch is based on Tyler's series "[PATCH V11 00/10] Add UEFI 2.6 and ACPI 6.1 updates
for RAS on ARM64" and linux-next 20170302.

I'll add a cover letter next time, thanks.

> What happens if the SError Interrupt arrives while KVM was doing its work? We
> set the HCR_EL2.AMO bit when running a guest, so KVM may receive these instead
> of the host kernel.
> 

OK, I'll do it in next version.

> 
>> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
>> index 1122d7f..a32f046 100644
>> --- a/drivers/acpi/apei/Kconfig
>> +++ b/drivers/acpi/apei/Kconfig
>> @@ -18,6 +18,20 @@ config HAVE_ACPI_APEI_SEA
>>  	  option allows the OS to look for such hardware error record, and
>>  	  take appropriate action.
>>  
>> +config ACPI_APEI_SEI
>> +	bool "APEI Asynchronous SError Interrupt logging/recovering support"
>> +	depends on ARM64 && ACPI_APEI_GHES
>> +	help
>> +	  This option should be enabled if the system supports
>> +	  firmware first handling of SEI (asynchronous SError interrupt).
>> +
>> +	  SEI happens with invalid instruction access or asynchronous exceptions
>> +	  on ARMv8 systems. If a system supports firmware first handling of SEI,
>> +	  the platform analyzes and handles hardware error notifications from
>> +	  SEI, and it may then form a HW error record for the OS to parse and
>> +	  handle. This option allows the OS to look for such hardware error
>> +	  record, and take appropriate action.
>> +
>>  config ACPI_APEI
>>  	bool "ACPI Platform Error Interface (APEI)"
>>  	select MISC_FILESYSTEMS
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 3e4ea1b..d084a09 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -850,6 +850,50 @@ static inline void ghes_sea_remove(struct ghes *ghes)
>>  }
>>  #endif /* CONFIG_HAVE_ACPI_APEI_SEA */
>>  
>> +#ifdef CONFIG_ACPI_APEI_SEI
>> +static LIST_HEAD(ghes_sei);
>> +
>> +void ghes_notify_sei(void)
>> +{
>> +	struct ghes *ghes;
>> +
>> +	/*
>> +	 * synchronize_rcu() will wait for nmi_exit(), so no need to
> 
> Where nmi_exit()?
> 
> This nmi enter/exit was to prevent APEI being interrupted by APEI and trying to
> take the same set of locks. APEI masks IRQs to prevent this happening normally,
> but Synchronous External Abort couldn't be masked.
> We don't mask Asynchronous Exceptions in APEI so the same thing can happen here.
> Adding nmi_{enter,exit}() round the ghes call in the arch bad_mode() will
> prevent this lockup.
> 

Thank you for your detailed explanation, I'll add it in next version.

Thanks,
Xie XiuQi

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html