Re: [PATCH] PCI/ERR: Fix run error recovery callbacks for all affected devices

Sinan Kaya <okaya@xxxxxxxxxx> · Thu, 24 Jan 2019 13:18:26 -0500

On 1/24/2019 8:50 AM, Dongdong Liu wrote:
The patch [1] PCI/ERR: Run error recovery callbacks for all affected
devices have broken the non-fatal error handling logic in patch [2].
For non-fatal error, link is reliable, so no need to reset link,
handle non-fatal error for all subordinates seems incorrect.
Restore the non-fatal errors process logic.

[1] PCI/ERR: Run error recovery callbacks for all affected devices   #4.20
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfcb79fca19d267712e425af1dd48812c40dec0c

[2] PCI/AER: Report non-fatal errors only to the affected endpoint  #4.15
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.0-rc2&id=86acc790717fb60fb51ea3095084e331d8711c74

Fixes: bfcb79fca19d ("PCI/ERR: Run error recovery callbacks for all affected devices")
Reported-by: Xiaofei Tan<tanxiaofei@xxxxxxxxxx>
Signed-off-by: Dongdong Liu<liudongdong3@xxxxxxxxxx>
Cc: Keith Busch<keith.busch@xxxxxxxxx>
Cc: Bjorn Helgaas<bhelgaas@xxxxxxxxxx>

According to what I see in the code, link will be reset only if the AER
severity is AER_FATAL.

	} else if (info->severity == AER_NONFATAL)
		pcie_do_recovery(dev, pci_channel_io_normal,
				 PCIE_PORT_SERVICE_AER);
	else if (info->severity == AER_FATAL)
		pcie_do_recovery(dev, pci_channel_io_frozen,
				 PCIE_PORT_SERVICE_AER);

Can you show the path where it leads to link reset and severity is AER_NONFATAL?