Re: [PATCH 1/1] efi: cper: print AER info of PCIe fatal error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi James,
Thanks for the review.

On 2019/7/25 20:44, James Morse wrote:
> Hi,
> 
> On 12/07/2019 03:20, Xiaofei Tan wrote:
>> AER info of PCIe fatal error is not printed in the current driver.
>> Because APEI driver will panic directly for fatal error, and can't
>> run to the place of printing AER info.
>>
>> An example log is as following:
>> [ 3157.655028] {763}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 11
>> [ 3157.663610] {763}[Hardware Error]: event severity: fatal
>> [ 3157.663612] {763}[Hardware Error]:  Error 0, type: fatal
>> [ 3157.663614] {763}[Hardware Error]:   section_type: PCIe error
>> [ 3157.680328] {763}[Hardware Error]:   port_type: 0, PCIe end point
>> [ 3157.680329] {763}[Hardware Error]:   version: 4.0
>> [ 3157.680332] {763}[Hardware Error]:   command: 0x0000, status: 0x0010
>> [ 3157.698757] {763}[Hardware Error]:   device_id: 0000:82:00.0
>> [ 3157.698758] {763}[Hardware Error]:   slot: 0
>> [ 3157.698759] {763}[Hardware Error]:   secondary_bus: 0x00
>> [ 3157.698760] {763}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x10fb
>> [ 3157.698761] {763}[Hardware Error]:   class_code: 000002
>> [ 3157.698825] Kernel panic - not syncing: Fatal hardware error!
>>
>> This issue was imported by the patch, '37448adfc7ce ("aerdrv: Move
>> cper_print_aer() call out of interrupt context")'. To fix this issue,
>> this patch adds print of AER info in cper_print_pcie() for fatal error.
>>
>> Here is the example log after this patch applied:
>> [ 7032.893566] {24}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 10
>> [ 7032.901965] {24}[Hardware Error]: event severity: fatal
>> [ 7032.907166] {24}[Hardware Error]:  Error 0, type: fatal
>> [ 7032.912366] {24}[Hardware Error]:   section_type: PCIe error
>> [ 7032.917998] {24}[Hardware Error]:   port_type: 0, PCIe end point
>> [ 7032.923974] {24}[Hardware Error]:   version: 4.0
>> [ 7032.928569] {24}[Hardware Error]:   command: 0x0546, status: 0x4010
>> [ 7032.934806] {24}[Hardware Error]:   device_id: 0000:01:00.0
>> [ 7032.940352] {24}[Hardware Error]:   slot: 0
>> [ 7032.944514] {24}[Hardware Error]:   secondary_bus: 0x00
>> [ 7032.949714] {24}[Hardware Error]:   vendor_id: 0x15b3, device_id: 0x1019
>> [ 7032.956381] {24}[Hardware Error]:   class_code: 000002
>> [ 7032.961495] {24}[Hardware Error]:   aer_uncor_status: 0x00040000, aer_uncor_mask: 0x00000000
>> [ 7032.969891] {24}[Hardware Error]:   aer_uncor_severity: 0x00062010
>> [ 7032.976042] {24}[Hardware Error]:   TLP Header: 000000c0 01010000 00000001 00000000
>> [ 7032.983663] Kernel panic - not syncing: Fatal hardware error!
> 
>> Fixes: 37448adfc7ce ("aerdrv: Move cper_print_aer() call out of
>> interrupt context")
> 
> (Please put this all on one line)
> 

OK.

>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 8fa977c..bf8600d 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -390,6 +390,19 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie,
>>  		printk(
>>  	"%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n",
>>  	pfx, pcie->bridge.secondary_status, pcie->bridge.control);
> 
> It may be worth a comment explaining why we only do this for fatal errors. Something like:
> | /* Fatal errors call __ghes_panic() before the AER handler gets to print this */
> 

OK. I will add this comment.

> 
>> +	if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO &&
>> +	    gdata->error_severity & CPER_SEV_FATAL) {
>> +		struct aer_capability_regs *aer;
>> +
>> +		aer = (struct aer_capability_regs *)pcie->aer_info;
>> +		printk("%saer_uncor_status: 0x%08x, aer_uncor_mask: 0x%08x\n",
> 
> The convention in the rest of the file is for the prefix format string to be separate. i.e:
> | "%s""aer_uncor_status: ..."
> 
> Could it be the same for consistency?
>

That way is not accepted by checkpatch.pl anymore, and was not used in some new commit.
Such as :
printk("%ssection_type: ARM processor error\n", newpfx);
and
printk("%ssection_type: IA32/X64 processor error\n", newpfx);

>> +		       pfx, aer->uncor_status, aer->uncor_mask);
>> +		printk("%saer_uncor_severity: 0x%08x\n",
>> +		       pfx, aer->uncor_severity);
>> +		printk("%sTLP Header: %08x %08x %08x %08x\n", pfx,
>> +		       aer->header_log.dw0, aer->header_log.dw1,
>> +		       aer->header_log.dw2, aer->header_log.dw3);
>> +	}
>>  }
> 
> Regardless,
> Reviewed-by; James Morse <james.morse@xxxxxxx>
> 
> 
> Thanks,
> 
> James
> 
> .
> 

-- 
 thanks
tanxiaofei




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux