Re: [PATCH v3] PCIe AER: report uncorrectable errors only to the functions that logged the errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 28, 2017 at 03:33:05PM +0100, Gabriele Paoloni wrote:
> Currently if an uncorrectable error is reported by an EP the AER
> driver walks over all the devices connected to the upstream port
> bus and in turns call the report_error_detected() callback.
> If any of the devices connected to the bus does not implement
> dev->driver->err_handler->error_detected() do_recovery() will fail
> leaving all the bus hierarchy devices unrecovered.
> 
> According to section "6.2.2.2.2. Non-Fatal Errors" of the PCIe specs
> << Non-fatal errors are uncorrectable errors which cause a particular
> transaction to be unreliable but the Link is otherwise fully functional.
> Isolating Non-fatal from Fatal errors provides Requester/Receiver logic
> in a device or system management software the opportunity to recover
> from the error without resetting the components on the Link and
> disturbing other transactions in progress. Devices not associated with
> the transaction in error are not impacted by the error.>>
> therefore for non fatal errors the PCIe link should not be considered
> compromised and it makes sense to report the error only to all the
> functions that logged an error.
> 
> This patch implements this new behaviour for non fatal errors.
> Also this patch fixes a bug (filed as in the link below)
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055
> Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@xxxxxxxxxx>
> Signed-off-by: Dongdong Liu <liudongdong3@xxxxxxxxxx>

Applied to pci/aer for v4.15, thanks!

I rewrote some of the changelog to say "non-fatal" instead of
"uncorrectable", since "uncorrectable" also includes fatal errors,
and you're not changing those.  Take a look and let me know if
I broke anything.

> ---
> Changes from v2:
>    - no functional changes
>    - Added reference in the commit log to the bugzilla ticket
>    - Added reference in the commit log the commit that this patch fixes
>    - Added reference in the commit log to the PCIe specs for Non-fatal
>      error handling rules
>  
> Changes from v1:
>    - now errors are reported only to the fucntions that logged the error
>      instead of all the functions in the same device.
>    - the patch subject has changed to match the new implementation
> ---
>  drivers/pci/pcie/aer/aerdrv_core.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 890efcc..7448052 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -390,7 +390,14 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
>  		 * If the error is reported by an end point, we think this
>  		 * error is related to the upstream link of the end point.
>  		 */
> -		pci_walk_bus(dev->bus, cb, &result_data);
> +		if (state == pci_channel_io_normal)
> +			/*
> +			 * the error is non fatal so the bus is ok, just invoke
> +			 * the callback for the function that logged the error.
> +			 */
> +			cb(dev, &result_data);
> +		else
> +			pci_walk_bus(dev->bus, cb, &result_data);
>  	}
>  
>  	return result_data.result;
> -- 
> 2.7.4
> 
> 



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux