Re: [PATCH 1/4] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 07, 2021 at 06:00:20PM -0500, Bjorn Helgaas wrote:
> On Sat, Jul 31, 2021 at 02:39:01PM +0200, Lukas Wunner wrote:
> > The issue isn't DPC-specific, it also occurs when an error is handled by
> > AER through aer_root_reset().  So while the issue was noticed only now,
> > it's been around since 2006 when AER support was first introduced.
> > 
> > Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
[...]
> > Cc: stable@xxxxxxxxxxxxxxx # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
> 
> Have you tried backporting this to v2.6.19?  I bet it's tough.  Not
> sure we should suggest that stable pick this up unless there's a
> reasonable path to do that.

The oldest kernel.org stable release that still receives updates is v4.4.
There may be older distribution kernels out there which continue to be
supported.  We don't know for sure, so I think it's customary to tag
the release that introduced an issue and leave it to the stable and
distribution maintainers to choose what they backport.

It's true that patches often do not apply cleanly to older releases.
There are some good folks who regularly take up the thankless task of
backporting (Sudip Mukherjee to name but one), but I've also frequently
backported patches myself where necessary.


> > +config PCI_ERROR_RECOVERY
> > +	def_bool PCIEAER || EEH
> 
> I'm having a hard time connecting this to the code.
[...]
> But this still seems like it's kind of dangling.  It's not obvious to
> me why pciehp_slot_reset() should be inside that #ifdef.  Do we need
> the #ifdef for a functional reason, or is it there because we know it
> will never be called?  If the latter, I don't think the savings are
> worth it.

The motivation for the #ifdef was merely to reduce code size if neither
of PCIEAER or EEH is enabled.  Happy to remove it.

I have a different question though.  We've often discussed deprecating
portdrv and moving port services to the core instead.  I've stumbled
across commit a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core,
not PCIe port driver"), wherein you moved pcie_pme_root_status_cleanup()
from portdrv to the core, which allowed you to drop the ->resume_noirq
callback from portdrv.

I've been thinking what moving port services to the core would look like
and what your vision for that might be.  If that commit is any indication,
it seems you'd probably prefer that pciehp_slot_reset() (as introduced in
the present patch) is called directly from report_slot_reset() in
drivers/pci/pcie/err.c.

That would eliminate much of the plumbing contained in the present patch
to allow each port service driver to define a ->slot_reset callback and
iterate over those callbacks.  pciehp is probably the only port service
which requires special handling upon ->slot_reset and the same goes for
a lot of the error handling and power management callbacks defined by
port services.  So all that plumbing is probably just a very roundabout
way of doing things.

When calling into port service code from core code, one needs to find
the port service's driver data.  For now we can resort to
pcie_port_find_device(), but I imagine once we move all port services
to the core, we'll be able to access their data directly from
struct pci_dev, e.g. via pdev->hotplug or pdev->pcie->hotplug pointers.

Does that make sense to you?

Thanks,

Lukas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux