[+cc Alex] On Mon, Dec 16, 2019 at 06:30:11PM +0800, Yicong Yang wrote: > On 2019/12/14 6:57, Bjorn Helgaas wrote: > > On Fri, Dec 13, 2019 at 07:44:34PM +0800, Yicong Yang wrote: > >> The PCI error recovery will fail if any device under > >> root port doesn't have an error_detected callback. > >> Currently only failure result is printed, which is > >> not enough to determine which device leads to the > >> failure and the detailed failure reason. > >> > >> Add print information if certain device under root > >> port has no error_detected callback. > >> > >> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx> > > Applied to pci/aer for v5.6, thanks! > > > > I also added a trivial follow-on patch to factor out the "AER: " > > prefix (attached below). This code is now used by DPC as well as AER, > > so "AER: " might not be exactly the correct prefix, but I didn't try > > to untangle that. > > > >> --- > >> drivers/pci/pcie/err.c | 4 +++- > >> 1 file changed, 3 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > >> index b0e6048..ec37c33 100644 > >> --- a/drivers/pci/pcie/err.c > >> +++ b/drivers/pci/pcie/err.c > >> @@ -61,8 +61,10 @@ static int report_error_detected(struct pci_dev *dev, > >> * error callbacks of "any" device in the subtree, and will > >> * exit in the disconnected error state. > >> */ > >> - if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) > >> + if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { > >> vote = PCI_ERS_RESULT_NO_AER_DRIVER; > >> + pci_info(dev, "AER: Device has no error_detected callback\n"); > >> + } > >> else > >> vote = PCI_ERS_RESULT_NONE; > >> } else { > >> -- > >> 2.8.1 > >> > > commit 9694ef043ea4 ("PCI/AER: Factor message prefixes with dev_fmt()") > > Author: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > Date: Fri Dec 13 16:46:05 2019 -0600 > > > > PCI/AER: Factor message prefixes with dev_fmt() > > > > Define dev_fmt() with the common prefix of log messages so we don't have to > > repeat it in every printk. No functional change intended. > > > > Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > > index abde5e000f5d..747ef8b41d1f 100644 > > --- a/drivers/pci/pcie/err.c > > +++ b/drivers/pci/pcie/err.c > > @@ -10,6 +10,8 @@ > > * Zhang Yanmin (yanmin.zhang@xxxxxxxxx) > > */ > > > > +#define dev_fmt(fmt) "AER: " fmt > > + > > #include <linux/pci.h> > > #include <linux/module.h> > > #include <linux/kernel.h> > > @@ -64,7 +66,7 @@ static int report_error_detected(struct pci_dev *dev, > > */ > > if (dev->hdr_type != PCI_HEADER_TYPE_BRIDGE) { > > vote = PCI_ERS_RESULT_NO_AER_DRIVER; > > - pci_info(dev, "AER: Driver has no error_detected callback\n"); > > + pci_info(dev, "driver has no error_detected callback\n"); > > I think use "device" here is more proper. Sometimes the device is > even not bind to the driver, which will also lead to the recovery > failure. Hmmm, maybe so, although the error_detected callback is definitely a property of the *driver*, not the device. What would you think of something like this? pci_info(dev, dev->driver ? "driver not capable of error recovery\n" : "no driver\n"); I am curious about the larger question of why we fail if the device has no driver. I understand why we fail the recovery for a device with a driver that has no err_handlers: we can't just reset the device out from under the driver. But why do we fail if a device has no driver at all? Shouldn't it be safe to reset such a device? The commit log of 918b4053184c ("PCI/AER: Report success only when every device has AER-aware driver") suggests that it has something to do with KVM and the pci-stub driver, but I don't understand it. > > } else { > > vote = PCI_ERS_RESULT_NONE; > > } > > @@ -236,12 +238,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state, > > > > pci_aer_clear_device_status(dev); > > pci_cleanup_aer_uncorrect_error_status(dev); > > - pci_info(dev, "AER: Device recovery successful\n"); > > + pci_info(dev, "device recovery successful\n"); > > return; > > > > failed: > > pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT); > > > > /* TODO: Should kernel panic here? */ > > - pci_info(dev, "AER: Device recovery failed\n"); > > + pci_info(dev, "device recovery failed\n"); > > }