Re: [PATCHv4 08/12] PCI: ERR: Always use the first downstream port

Keith Busch <keith.busch@xxxxxxxxx> · Tue, 2 Oct 2018 13:55:00 -0600

On Tue, Oct 02, 2018 at 02:35:22PM -0500, Bjorn Helgaas wrote:
> Here's my proposal for the changelog.  Let me know what I screwed up.
> 
> commit 1f7d2967334433d885c0712b8ac3f073f20211ee
> Author: Keith Busch <keith.busch@xxxxxxxxx>
> Date:   Thu Sep 20 10:27:13 2018 -0600
> 
>     PCI/ERR: Run error recovery callbacks for all affected devices
>     
>     If an Endpoint reported an error with ERR_FATAL, we previously ran driver
>     error recovery callbacks only for the Endpoint's driver.  But if we reset a
>     Link to recover from the error, all downstream components are affected,
>     including the Endpoint, any multi-function peers, and children of those
>     peers.
>     
>     Initiate the Link reset from the deepest Downstream Port that is
>     reliable, and call the error recovery callbacks for all its children.
>     
>     If a Downstream Port (including a Root Port) reports an error, we assume
>     the Port itself is reliable and we need to reset its downstream Link.  In
>     all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume
>     the Link leading to the component needs to be reset, so we initiate the
>     reset at the parent Downstream Port.
>     
>     This allows two other clean-ups.  First, we currently only use a Link
>     reset, which can only be initiated using a Downstream Port, so we can
>     remove checks for Endpoints.  Second, the Downstream Port where we initiate
>     the Link reset is reliable (unlike the device that reported the error), so
>     the special cases for error detect and resume are no longer necessary.

A downstream port may have been the device that reports the error, but
we still consider that to be accessible. Maybe "unlike its subordinate
bus".

Otherwise this sounds good to me.