On 1/24/2019 8:50 AM, Dongdong Liu wrote:
The patch [1] PCI/ERR: Run error recovery callbacks for all affected devices have broken the non-fatal error handling logic in patch [2]. For non-fatal error, link is reliable, so no need to reset link, handle non-fatal error for all subordinates seems incorrect. Restore the non-fatal errors process logic. [1] PCI/ERR: Run error recovery callbacks for all affected devices #4.20 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfcb79fca19d267712e425af1dd48812c40dec0c [2] PCI/AER: Report non-fatal errors only to the affected endpoint #4.15 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.0-rc2&id=86acc790717fb60fb51ea3095084e331d8711c74 Fixes: bfcb79fca19d ("PCI/ERR: Run error recovery callbacks for all affected devices") Reported-by: Xiaofei Tan<tanxiaofei@xxxxxxxxxx> Signed-off-by: Dongdong Liu<liudongdong3@xxxxxxxxxx> Cc: Keith Busch<keith.busch@xxxxxxxxx> Cc: Bjorn Helgaas<bhelgaas@xxxxxxxxxx>
According to what I see in the code, link will be reset only if the AER severity is AER_FATAL. } else if (info->severity == AER_NONFATAL) pcie_do_recovery(dev, pci_channel_io_normal, PCIE_PORT_SERVICE_AER); else if (info->severity == AER_FATAL) pcie_do_recovery(dev, pci_channel_io_frozen, PCIE_PORT_SERVICE_AER); Can you show the path where it leads to link reset and severity is AER_NONFATAL?