On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: > Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") > changed pcie_do_recovery() so that status is updated with the return > value from reset_link(); this was to fix the problem where we would > wrongly report recovery failure, despite a successful reset_link(), > whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or > PCI_ERS_RESULT_NO_AER_DRIVER. > > Unfortunately this breaks the flow of pcie_do_recovery() as it prevents What is the reference to "this breaks" above? > the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER > or PCI_ERS_RESULT_NEED_RESET from taking place which causes error > recovery to fail. > > Don't clobber status after reset_link() to restore the intended flow in > pcie_do_recovery(). > > Fix the original problem by saving the return value from reset_link() > and use it later on to decide whether error recovery should be deemed > successful in the scenarios where the initial error status is > PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. I would rather rephrase the above to make it clear what is being proposed. Since the description seems to talk about the old problem and new solution all mixed up. > > Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") > Signed-off-by: Hedi Berriche <hedi.berriche@xxxxxxx> > Cc: Russ Anderson <rja@xxxxxxx> > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: Ashok Raj <ashok.raj@xxxxxxxxx> > Cc: Keith Busch <keith.busch@xxxxxxxxx> > Cc: Joerg Roedel <jroedel@xxxxxxxx> > > Cc: stable@xxxxxxxxxx # v5.7+ > --- > drivers/pci/pcie/err.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index c543f419d8f9..dbd0b56bd6c1 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_channel_state_t state, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) > { > - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; why call it post_reset_status? > struct pci_bus *bus; > > /* > @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(dev, "broadcast error_detected message\n"); > if (state == pci_channel_io_frozen) { > pci_walk_bus(bus, report_frozen_detected, &status); > - status = reset_link(dev); > - if (status != PCI_ERS_RESULT_RECOVERED) { > + post_reset_status = reset_link(dev); > + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { > pci_warn(dev, "link reset failed\n"); > goto failed; > } > @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_walk_bus(bus, report_normal_detected, &status); > } > > + if ((status == PCI_ERS_RESULT_DISCONNECT || > + status == PCI_ERS_RESULT_NO_AER_DRIVER) && > + post_reset_status == PCI_ERS_RESULT_RECOVERED) { > + /* error recovery succeeded thanks to reset_link() */ > + status = PCI_ERS_RESULT_RECOVERED; > + } > + > if (status == PCI_ERS_RESULT_CAN_RECOVER) { > status = PCI_ERS_RESULT_RECOVERED; > pci_dbg(dev, "broadcast mmio_enabled message\n"); > -- > 2.28.0 >