I've got a number of server that when APEI reporting is enabled in the
BIOS will periodically show messages like these dmesg:
[3989185.213054] {31}[Hardware Error]: Hardware error from APEI Generic
Hardware Error Source: 0
[3989185.213059] {31}[Hardware Error]: APEI generic hardware error status
[3989185.213063] {31}[Hardware Error]: severity: 2, corrected
[3989185.213066] {31}[Hardware Error]: section: 0, severity: 2, corrected
[3989185.213069] {31}[Hardware Error]: flags: 0x01
[3989185.213072] {31}[Hardware Error]: primary
[3989185.213074] {31}[Hardware Error]: section_type: PCIe error
[3989185.213077] {31}[Hardware Error]: port_type: 4, root port
[3989185.213080] {31}[Hardware Error]: version: 1.16
[3989185.213083] {31}[Hardware Error]: command: 0x0010, status: 0x0547
[3989185.213086] {31}[Hardware Error]: device_id: 0000:04:00.0
[3989185.213089] {31}[Hardware Error]: slot: 0
[3989185.213091] {31}[Hardware Error]: secondary_bus: 0x00
[3989185.213093] {31}[Hardware Error]: vendor_id: 0x1000, device_id: 0x005d
[3989185.213096] {31}[Hardware Error]: class_code: 040100
The kernel is on the older side, it's a SLES11 kernel (3.0.101), and PCI
device 0000:04:00.0 is an LSI-based RAID controller.
All I've been able to figure out here is that an error happened and it
was corrected but I've not been able to figure out (and I've tried
scouring the ACPI, UEFI, and PCIe specs) *what* happened (e.g. command
timeout, parity error, etc.).
I found this file
http://lxr.free-electrons.com/source/Documentation/acpi/apei/output_format.txt?v=3.0
that seems to outline the output format of apei errors and there's a
whole bunch of aer_* fields that can appear with PCIe errors but I'm not
seeing any of those here.
My questions, specifically are:
1. Is this event data enough alone to identify what kind of an error
occurred here? If so what error was it?
2. Should there be additional aer_ fields with the error? If so how can
I see them and is it up to the firmware to pass these along? Are there
some ACPI table I can query that would indicate whether or not the
hardware is configured to report them?
Thanks!
-Aaron
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html