Re: [PATCH] PCI/ERR: Fix run error recovery callbacks for all affected devices

Keith Busch <keith.busch@xxxxxxxxx> · Thu, 24 Jan 2019 14:37:01 -0700

On Thu, Jan 24, 2019 at 10:18:26AM -0800, Sinan Kaya wrote:
> On 1/24/2019 8:50 AM, Dongdong Liu wrote:
> > The patch [1] PCI/ERR: Run error recovery callbacks for all affected
> > devices have broken the non-fatal error handling logic in patch [2].
> > For non-fatal error, link is reliable, so no need to reset link,
> > handle non-fatal error for all subordinates seems incorrect.
> > Restore the non-fatal errors process logic.
> > 
> > [1] PCI/ERR: Run error recovery callbacks for all affected devices   #4.20
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfcb79fca19d267712e425af1dd48812c40dec0c
> > 
> > [2] PCI/AER: Report non-fatal errors only to the affected endpoint  #4.15
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.0-rc2&id=86acc790717fb60fb51ea3095084e331d8711c74
> > 
> > Fixes: bfcb79fca19d ("PCI/ERR: Run error recovery callbacks for all affected devices")
> > Reported-by: Xiaofei Tan<tanxiaofei@xxxxxxxxxx>
> > Signed-off-by: Dongdong Liu<liudongdong3@xxxxxxxxxx>
> > Cc: Keith Busch<keith.busch@xxxxxxxxx>
> > Cc: Bjorn Helgaas<bhelgaas@xxxxxxxxxx>
> 
> 
> According to what I see in the code, link will be reset only if the AER
> severity is AER_FATAL.
> 
> 	} else if (info->severity == AER_NONFATAL)
> 		pcie_do_recovery(dev, pci_channel_io_normal,
> 				 PCIE_PORT_SERVICE_AER);
> 	else if (info->severity == AER_FATAL)
> 		pcie_do_recovery(dev, pci_channel_io_frozen,
> 				 PCIE_PORT_SERVICE_AER);
> 
> Can you show the path where it leads to link reset and severity is AER_NONFATAL?

Yes, I don't follow what this patch is saying either, and it actually
looks quite broken: it assigns 'bridge' to 'dev', which may not even be a
bridge, and then dereferences 'bridge->subordinate' which be NULL.