On Thu, Jan 24, 2019 at 10:18:26AM -0800, Sinan Kaya wrote: > On 1/24/2019 8:50 AM, Dongdong Liu wrote: > > The patch [1] PCI/ERR: Run error recovery callbacks for all affected > > devices have broken the non-fatal error handling logic in patch [2]. > > For non-fatal error, link is reliable, so no need to reset link, > > handle non-fatal error for all subordinates seems incorrect. > > Restore the non-fatal errors process logic. > > > > [1] PCI/ERR: Run error recovery callbacks for all affected devices #4.20 > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfcb79fca19d267712e425af1dd48812c40dec0c > > > > [2] PCI/AER: Report non-fatal errors only to the affected endpoint #4.15 > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.0-rc2&id=86acc790717fb60fb51ea3095084e331d8711c74 > > > > Fixes: bfcb79fca19d ("PCI/ERR: Run error recovery callbacks for all affected devices") > > Reported-by: Xiaofei Tan<tanxiaofei@xxxxxxxxxx> > > Signed-off-by: Dongdong Liu<liudongdong3@xxxxxxxxxx> > > Cc: Keith Busch<keith.busch@xxxxxxxxx> > > Cc: Bjorn Helgaas<bhelgaas@xxxxxxxxxx> > > > According to what I see in the code, link will be reset only if the AER > severity is AER_FATAL. > > } else if (info->severity == AER_NONFATAL) > pcie_do_recovery(dev, pci_channel_io_normal, > PCIE_PORT_SERVICE_AER); > else if (info->severity == AER_FATAL) > pcie_do_recovery(dev, pci_channel_io_frozen, > PCIE_PORT_SERVICE_AER); > > Can you show the path where it leads to link reset and severity is AER_NONFATAL? Yes, I don't follow what this patch is saying either, and it actually looks quite broken: it assigns 'bridge' to 'dev', which may not even be a bridge, and then dereferences 'bridge->subordinate' which be NULL.