On Wed, Sep 26, 2018 at 05:01:16PM -0500, Bjorn Helgaas wrote: > On Thu, Sep 20, 2018 at 10:27:13AM -0600, Keith Busch wrote: > > The link reset always used the first bridge device, but AER broadcast > > error handling may have reported an end device. This means the reset may > > hit devices that were never notified of the impending error recovery. > > > > This patch uses the first downstream port in the hierarchy considered > > reliable. An error detected by a switch upstream port should mean it > > occurred on its upstream link, so the patch selects the parent device > > if the error is not a root or downstream port. > > I'm not really clear on what "Always use the first downstream port" > means. Always use it for *what*? > > I already applied this, but if we can improve the changelog, I'll > gladly update it. I'll see if I can better rephrase. Error handling should notify all affected pci functions. If an end device detects and emits ERR_FATAL, the old way would have only notified that end-device driver, but other functions may be on or below the same bus. Using the downstream port that connects to that bus where the error was detectedas the anchor point to broadcast error handling progression, we can notify all functions so they have a chance to prepare for the link reset.