Re: [PATCHv2 3/5] PCI/ERR: Retain status from error notification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/4/21 2:59 PM, Dan Williams wrote:
On Thu, Mar 4, 2021 at 2:38 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@xxxxxxxxx> wrote:

On 3/4/21 2:11 PM, Dan Williams wrote:

On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@xxxxxxxxxx> wrote:

On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:

On 3/2/21 9:34 PM, Williams, Dan J wrote:

[ Add Sathya ]

On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:

Overwriting the frozen detected status with the result of the link reset
loses the NEED_RESET result that drivers are depending on for error
handling to report the .slot_reset() callback. Retain this status so
that subsequent error handling has the correct flow.

Reported-by: Hinko Kocevar <hinko.kocevar@xxxxxx>
Acked-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx>
Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>

Just want to report that this fix might be a candidate for -stable.

Agree.

I think it can be merged in both stable and mainline kernels.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>

Just FYI, this patch is practically a revert of this one:

   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d

so please let me know if that is still a problem for you.

For what it's worth I think "6d2c89441571 PCI/ERR: Update error status
after reset_link()" is not justified. The link shouldn't recover if
the attached device is not prepared to handle DPC events.

I added that fix to address the recovery issue seen in a Dell server
platform (for EDR test case). If I understand the history correctly,
In EDR case, AER and DPC is owned by firmware, hence we get
PCI_ERS_RESULT_NO_AER_DRIVER when executing error_detected() callbacks.
So If we continue the pcie_do_recovery() with PCI_ERS_RESULT_NO_AER_DRIVER
as error status, then even if we successfully reset the link we will report
the recovery status as failure.
But that's the right response if there is no handler.
If the handler is not available due to AER being owned by firmware,
then it needs to be fixed. In EDR mode, even if DPC/AER is owned
by firmware , OS need to own the recovery part. So I think it
needs further investigation to understand why it reports,
PCI_ERS_RESULT_NO_AER_DRIVER
If there is a
device attached that was not prepared for the link to go down then it
does not matter if the link comes back ok that device will be
thoroughly confused and should be disconnected.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux