On 4/23/2019 5:42 PM, Alex Williamson wrote: > The PCIe bandwidth notification service generates logging any time a > link changes speed or width to a state that is considered downgraded. > Unfortunately, it cannot differentiate signal integrity related link > changes from those intentionally initiated by an endpoint driver, > including drivers that may live in userspace or VMs when making use > of vfio-pci. Therefore, allow the driver to have a say in whether > the link is indeed downgraded and worth noting in the log, or if the > change is perhaps intentional. > > For vfio-pci, we don't know the intentions of the user/guest driver > either, but we do know that GPU drivers in guests actively manage > the link state and therefore trigger the bandwidth notification for > what appear to be entirely intentional link changes. > > Fixes: e8303bb7a75c PCI/LINK: Report degraded links via link bandwidth notification > Link: https://lore.kernel.org/linux-pci/155597243666.19387.1205950870601742062.stgit@xxxxxxxxxx/T/#u > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > --- > > Changing to pci_dbg() logging is not super usable, so let's try the > previous idea of letting the driver handle link change events as they > see fit. Ideally this might be two patches, but for easier handling, > folding the pci and vfio-pci bits together. Comments? Thanks, I think this callback opens up a can of worms where drivers can ad-hoc kill a number what otherwise can be indicators of problems. But I don't have to like it to review it :). > drivers/pci/probe.c | 13 +++++++++++++ > drivers/vfio/pci/vfio_pci.c | 10 ++++++++++ > include/linux/pci.h | 3 +++ > 3 files changed, 26 insertions(+) > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 7e12d0163863..233cd4b5b6e8 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -2403,6 +2403,19 @@ void pcie_report_downtraining(struct pci_dev *dev) I don't think you want to change pcie_report_downtraining(). You're advertising to "report" something, by nomenclature, but then go around and also call a notification callback. This is also used during probe, and you've now just killed your chance to notice you've booted with a degraded link. If what you want to do is silence the bandwidth notification, you want to modify the threaded interrupt that calls this. > if (PCI_FUNC(dev->devfn) != 0 || dev->is_virtfn) > return; > > + /* > + * If driver handles link_change event, defer to driver. PCIe drivers > + * can call pcie_print_link_status() to print current link info. > + */ > + device_lock(&dev->dev); > + if (dev->driver && dev->driver->err_handler && > + dev->driver->err_handler->link_change) { > + dev->driver->err_handler->link_change(dev); > + device_unlock(&dev->dev); > + return; > + } > + device_unlock(&dev->dev); Can we write this such that there is a single lock()/unlock() pair? > + > /* Print link status only if the device is constrained by the fabric */ > __pcie_print_link_status(dev, false); > } > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index cab71da46f4a..c9ffc0ccabb3 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1418,8 +1418,18 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, > return PCI_ERS_RESULT_CAN_RECOVER; > } > > +/* > + * Ignore link change notification, we can't differentiate signal related > + * link changes from user driver power management type operations, so do > + * nothing. Potentially this could be routed out to the user. > + */ > +static void vfio_pci_link_change(struct pci_dev *pdev) > +{ > +} > + > static const struct pci_error_handlers vfio_err_handlers = { > .error_detected = vfio_pci_aer_err_detected, > + .link_change = vfio_pci_link_change, > }; > > static struct pci_driver vfio_pci_driver = { > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 27854731afc4..e9194bc03f9e 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -763,6 +763,9 @@ struct pci_error_handlers { > > /* Device driver may resume normal operations */ > void (*resume)(struct pci_dev *dev); > + > + /* PCIe link change notification */ > + void (*link_change)(struct pci_dev *dev); > }; > > > >