Re: [PATCHv2 3/5] PCI/ERR: Retain status from error notification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 4, 2021 at 3:19 PM Kuppuswamy, Sathyanarayanan
<sathyanarayanan.kuppuswamy@xxxxxxxxx> wrote:
>
>
> On 3/4/21 2:59 PM, Dan Williams wrote:
> > On Thu, Mar 4, 2021 at 2:38 PM Kuppuswamy, Sathyanarayanan
> > <sathyanarayanan.kuppuswamy@xxxxxxxxx> wrote:
> >>
> >> On 3/4/21 2:11 PM, Dan Williams wrote:
> >>
> >> On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@xxxxxxxxxx> wrote:
> >>
> >> On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote:
> >>
> >> On 3/2/21 9:34 PM, Williams, Dan J wrote:
> >>
> >> [ Add Sathya ]
> >>
> >> On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote:
> >>
> >> Overwriting the frozen detected status with the result of the link reset
> >> loses the NEED_RESET result that drivers are depending on for error
> >> handling to report the .slot_reset() callback. Retain this status so
> >> that subsequent error handling has the correct flow.
> >>
> >> Reported-by: Hinko Kocevar <hinko.kocevar@xxxxxx>
> >> Acked-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx>
> >> Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
> >>
> >> Just want to report that this fix might be a candidate for -stable.
> >>
> >> Agree.
> >>
> >> I think it can be merged in both stable and mainline kernels.
> >>
> >> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx>
> >>
> >> Just FYI, this patch is practically a revert of this one:
> >>
> >>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d
> >>
> >> so please let me know if that is still a problem for you.
> >>
> >> For what it's worth I think "6d2c89441571 PCI/ERR: Update error status
> >> after reset_link()" is not justified. The link shouldn't recover if
> >> the attached device is not prepared to handle DPC events.
> >>
> >> I added that fix to address the recovery issue seen in a Dell server
> >> platform (for EDR test case). If I understand the history correctly,
> >> In EDR case, AER and DPC is owned by firmware, hence we get
> >> PCI_ERS_RESULT_NO_AER_DRIVER when executing error_detected() callbacks.
> >> So If we continue the pcie_do_recovery() with PCI_ERS_RESULT_NO_AER_DRIVER
> >> as error status, then even if we successfully reset the link we will report
> >> the recovery status as failure.
> > But that's the right response if there is no handler.
> If the handler is not available due to AER being owned by firmware,
> then it needs to be fixed. In EDR mode, even if DPC/AER is owned
> by firmware , OS need to own the recovery part. So I think it
> needs further investigation to understand why it reports,
> PCI_ERS_RESULT_NO_AER_DRIVER

As far as I can see the only way to get PCI_ERS_RESULT_NO_AER_DRIVER
is when there actually is no handler, or the device io state has set
to failed. I notice the hotplug handler sets the device io state to
failed while processing link down. If the device is actually missing a
handler definition then disconnect seems the right response.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux