Hi Keith,
On 12/14/20 10:23 PM, Keith Busch wrote:
On Wed, Dec 09, 2020 at 11:55:07PM +0100, Hinko Kocevar wrote:
Adding a bunch of printk()'s to portdrv_pci.c led to (partial) success!
So, the pcie_portdrv_error_detected() returns PCI_ERS_RESULT_CAN_RECOVER and
therefore the pcie_portdrv_slot_reset() is not called.
But the pcie_portdrv_err_resume() is called! Adding these two lines to
pcie_portdrv_err_resume(), before the call to device_for_each_child():
pci_restore_state(dev);
pci_save_state(dev);
You need to do that with the current kernel or are you still using a
3.10? A more recent kernel shouldn't have needed such a fix after the
This was tested on the 5.9.12 kernel at that time. As of today, I've
re-ran the tests on Bjorn's git tree, pci/err branch from Sunday (I
guess 5.10.0 version).
following commit was introduced:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=874b3251113a1e2cbe79c24994dc03fe4fe4b99b
I noticed the change you are pointing out when trying to propose a patch.
It made me curious on why the pcie_portdrv_slot_reset() is not invoked.
After sprinkling a couple of printk()'s around the pcie_do_recovery()
and pcie_portdrv_err_handler's I can observe that the
pcie_portdrv_slot_reset() is never called from pcie_do_recovery() due to
status returned by reset_subordinates() (actually aer_root_reset() from
pcie/aer.c) being PCI_ERS_RESULT_RECOVERED.
I reckon, in order to invoke the pcie_portdrv_slot_reset(), the
aer_root_reset() should have returned PCI_ERS_RESULT_NEED_RESET.
As soon as I plug the calls to pci_restore_state() and pci_save_state()
into the pcie_portdrv_err_resume() the bus and devices are operational
again.