On Tue, May 22, 2018 at 03:28:05PM -0700, Rajat Jain wrote: > Add the PCI AER statistics details to > Documentation/PCI/pcieaer-howto.txt > > Signed-off-by: Rajat Jain <rajatja@xxxxxxxxxx> > --- > Documentation/PCI/pcieaer-howto.txt | 35 +++++++++++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.txt > index acd0dddd6bb8..86ee9f9ff5e1 100644 > --- a/Documentation/PCI/pcieaer-howto.txt > +++ b/Documentation/PCI/pcieaer-howto.txt > @@ -73,6 +73,41 @@ In the example, 'Requester ID' means the ID of the device who sends > the error message to root port. Pls. refer to pci express specs for > other fields. > > +2.4 AER statistics > + > +When AER messages are captured, the statistics are exposed via the following > +sysfs attributes under the "aer_stats" folder for the device: > + > +2.4.1 Device sysfs Attributes > + > +These attributes show up under all the devices that are AER capable. These > +indicate the errors "as seen by the device". Note that this may mean that if > +an end point is causing problems, the AER counters may increment at its link > +partner (e.g. root port) because the errors will be "seen" by the link partner > +and not the the problematic end point itself (which may report all counters > +as 0 as it never saw any problems). > + > + * dev_total_cor_errs: number of correctable errors seen by the device. > + * dev_total_fatal_errs: number of fatal uncorrectable errors seen by the device. > + * dev_total_nonfatal_errs: number of nonfatal uncorr errors seen by the device. > + * dev_breakdown_correctable: Provides a breakdown of different type of > + correctable errors seen. > + * dev_breakdown_uncorrectable: Provides a breakdown of different type of > + uncorrectable errors seen. > + > +2.4.1 Rootport sysfs Attributes > + > +These attributes showup under only the rootports that are AER capable. These > +indicate the number of error messages as "reported to" the rootport. Please note > +that the rootports also transmit (internally) the ERR_* messages for errors seen > +by the internal rootport PCI device, so these counters includes them and are > +thus cumulative of all the error messages on the PCI hierarchy originating > +at that root port. > + > + * rootport_total_cor_errs: number of ERR_COR messages reported to rootport. > + * rootport_total_fatal_errs: number of ERR_FATAL messages reported to rootport. > + * rootport_total_nonfatal_errs: number of ERR_NONFATAL messages reporeted to > + rootport. These all belong in Documentation/ABI/ please. thanks, greg k-h