PCIe AER error report inconsistency in dmesg vs trace event

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm using the aer-inject tool to inject fake AER events for a pcie
device. This results in both an error message being logged (which I
can view with dmesg), and a trace event being generated
("trace_aer_event")

The aer events have a "severity" field. The value of this field
differs for the dmesg output and the perf output.

I inject a corrected error using a modified example file provided with
aer-inject:

AER
BUS 1 DEV 0 FN 0
COR_STATUS BAD_TLP
HEADER_LOG 0 1 2 3

The corresponding dmesg output seems right (severity=Corrected):

[ 3347.332137] pcieport 0000:00:02.0: AER: Corrected error received: id=0300
[ 3347.332148] i40e 0000:03:00.0: PCIe Bus Error: severity=Corrected,
type=Data Link Layer, id=0300(Receiver ID)
[ 3347.459035] i40e 0000:03:00.0:   device [8086:1580] error
status/mask=00000040/00002000
[ 3347.561953] i40e 0000:03:00.0:    [ 6] Bad TLP

However when I try to output corresponding AER trace events using
perf, the severity of the error is different (Corrected ->
Uncorrected):

     kworker/0:1 26557 [000]  3347.635532: ras:aer_event: 0000:03:00.0
PCIe Bus Error: severity=Uncorrected, non-fatal, Bad TLP

The same thing happens with trace-cmd.

     kworker/0:1-26557 [000]  3347.635526: aer_event:
0000:03:00.0 PCIe Bus Error: severity=Uncorrected, non-fatal, Bad TLP


I think both perf and trace-cmd use the event generated by
"trace-aer-event". I've tried this on both a VM and an actual machine,
both give me the same result. The VM was using kernel 4.10. The
machine was using an older kernel (4.4).

Is this a bug?


Preet



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux