On Wed, May 30, 2018 at 11:18:35AM -0700, Rajat Jain wrote: > On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > The Vendor and Device ID of the root port that raised an AER interrupt is > > irrelevant and already available via normal enumeration dmesg logging or > > lspci. > > Er, what is getting printed is not the vendor/device id of the root port > but that of the AER source device (the one that root port got an ERR_* > message from). In case of fatal AERs, the end point device may become > inaccessible so lspci will not be available, and enumeration logs (from > boot) may have gotten rolled over. So I think it is still better to print > this information here. Thanks for looking this over! You're right, "dev" here is not necessarily the Root Port, so this changelog is bogus. "dev" came from e_info->dev[] from aer_process_err_devices(). I think to be more precise, aer_irq() reads the Root Port's PCI_ERR_ROOT_ERR_SRC register, which gives us the Requester ID from the ERR_* message. Then find_source_device() walks the tree starting with the Root Port, looking for: - a device that matches the Requester ID, or - a device that doesn't match the Requester ID (e.g., because a VMD port clears the source ID) but has AER enabled and has logged an error of the same type (ERR_COR vs ERR_FATAL/NONFATAL) we're currently decoding So there might be multiple "dev" pointers in e_info->dev[] because several devices could have logged errors. I'm not convinced the vendor/device ID is that useful because there might be several devices with the same ID, so it doesn't really tell you which one. The Requester ID (bus/device/function) is the important thing. The current code is not ideal because the find_source_device() path depends on the pci_dev still being present and even accessible (so we can read DEVCTL, ERR_COR_STATUS, etc), which might not be the case. If find_source_device() fails, i.e., it can't find a matching pci_dev and prints the "can't find device of ID%04x" message, we're in real trouble because we don't call aer_process_err_devices(), which means we don't clear PCI_ERR_COR_STATUS. Anyway, I'll abandon this change for now since it's not a clear improvement. > > Remove the Vendor and Device ID from AER logging. > > > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > --- > > drivers/pci/pcie/aer/aerdrv_errprint.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c > b/drivers/pci/pcie/aer/aerdrv_errprint.c > > index d7fde8368d81..16116844531c 100644 > > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > > @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct > aer_err_info *info) > > aer_error_severity_string[info->severity], > > aer_error_layer[layer], aer_agent_string[agent]); > > > - pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", > > - dev->vendor, dev->device, > > - info->status, info->mask); > > + pci_err(dev, " error status/mask=%08x/%08x\n", info->status, > > + info->mask); > > > __aer_print_error(dev, info);