Re: [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 30, 2018 at 11:18:35AM -0700, Rajat Jain wrote:
> On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> 
> > From: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> 
> > The Vendor and Device ID of the root port that raised an AER interrupt is
> > irrelevant and already available via normal enumeration dmesg logging or
> > lspci.
> 
> Er, what is getting printed is not the vendor/device id of the root port
> but that of the AER source device (the one that root port got an ERR_*
> message from). In case of fatal AERs, the end point device may become
> inaccessible so lspci will not be available, and enumeration logs (from
> boot) may have gotten rolled over. So I think it is still better to print
> this information here.

Thanks for looking this over!

You're right, "dev" here is not necessarily the Root Port, so this
changelog is bogus.  "dev" came from e_info->dev[] from
aer_process_err_devices().

I think to be more precise, aer_irq() reads the Root Port's
PCI_ERR_ROOT_ERR_SRC register, which gives us the Requester ID from
the ERR_* message.  Then find_source_device() walks the tree starting
with the Root Port, looking for:

  - a device that matches the Requester ID, or
  - a device that doesn't match the Requester ID (e.g., because a VMD
    port clears the source ID) but has AER enabled and has logged an
    error of the same type (ERR_COR vs ERR_FATAL/NONFATAL) we're
    currently decoding

So there might be multiple "dev" pointers in e_info->dev[] because
several devices could have logged errors.

I'm not convinced the vendor/device ID is that useful because there
might be several devices with the same ID, so it doesn't really tell
you which one.  The Requester ID (bus/device/function) is the
important thing.

The current code is not ideal because the find_source_device() path
depends on the pci_dev still being present and even accessible (so we
can read DEVCTL, ERR_COR_STATUS, etc), which might not be the case.

If find_source_device() fails, i.e., it can't find a matching pci_dev
and prints the "can't find device of ID%04x" message, we're in real
trouble because we don't call aer_process_err_devices(), which means
we don't clear PCI_ERR_COR_STATUS.

Anyway, I'll abandon this change for now since it's not a clear
improvement.

> > Remove the Vendor and Device ID from AER logging.
> 
> > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > ---
> >   drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > index d7fde8368d81..16116844531c 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct
> aer_err_info *info)
> >                  aer_error_severity_string[info->severity],
> >                  aer_error_layer[layer], aer_agent_string[agent]);
> 
> > -       pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> > -               dev->vendor, dev->device,
> > -               info->status, info->mask);
> > +       pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
> > +               info->mask);
> 
> >          __aer_print_error(dev, info);



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux