On 9/2/20 11:42 AM, Andrey Grodzovsky wrote:
This reverts commit 6d2c89441571ea534d6240f7724f518936c44f8d. In the code bellow pci_walk_bus(bus, report_frozen_detected, &status); - if (reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED) + status = reset_link(dev, service); status returned from report_frozen_detected is unconditionally masked by status returned from reset_link which is wrong. This breaks error recovery implementation for AMDGPU driver by masking PCI_ERS_RESULT_NEED_RESET returned from amdgpu_pci_error_detected and hence skiping slot reset callback which is necessary for proper ASIC recovery. Effectively no other callback besides resume callback will be called after link reset the way it is implemented now regardless of what value error_detected callback returns.
} Instead of reverting this change, can you try following patch ? https://lore.kernel.org/linux-pci/56ad4901-725f-7b88-2117-b124b28b027f@xxxxxxxxxxxxxxx/T/#me8029c04f63c21f9d1cb3b1ba2aeffbca3a60df5 -- Sathyanarayanan Kuppuswamy Linux Kernel Developer _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx