On Tue, Oct 02, 2018 at 02:35:22PM -0500, Bjorn Helgaas wrote: > Here's my proposal for the changelog. Let me know what I screwed up. > > commit 1f7d2967334433d885c0712b8ac3f073f20211ee > Author: Keith Busch <keith.busch@xxxxxxxxx> > Date: Thu Sep 20 10:27:13 2018 -0600 > > PCI/ERR: Run error recovery callbacks for all affected devices > > If an Endpoint reported an error with ERR_FATAL, we previously ran driver > error recovery callbacks only for the Endpoint's driver. But if we reset a > Link to recover from the error, all downstream components are affected, > including the Endpoint, any multi-function peers, and children of those > peers. > > Initiate the Link reset from the deepest Downstream Port that is > reliable, and call the error recovery callbacks for all its children. > > If a Downstream Port (including a Root Port) reports an error, we assume > the Port itself is reliable and we need to reset its downstream Link. In > all other cases (Switch Upstream Ports, Endpoints, Bridges, etc), we assume > the Link leading to the component needs to be reset, so we initiate the > reset at the parent Downstream Port. > > This allows two other clean-ups. First, we currently only use a Link > reset, which can only be initiated using a Downstream Port, so we can > remove checks for Endpoints. Second, the Downstream Port where we initiate > the Link reset is reliable (unlike the device that reported the error), so > the special cases for error detect and resume are no longer necessary. A downstream port may have been the device that reports the error, but we still consider that to be accessible. Maybe "unlike its subordinate bus". Otherwise this sounds good to me.