Re: [PATCH v3 4/4] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 03, 2024 at 04:32:57PM +0100, Jonathan Cameron wrote:
> On Tue, 2 Apr 2024 16:45:32 -0700 Dave Jiang <dave.jiang@xxxxxxxxx> wrote:
> Why not pass the info on what reset was done down from the PCI core?
> I see Bjorn commented it would be *possible* to do it in the PCI core
> but raised other concerns that needed addressing first (I think you've
> dealt with those now).  Doesn't look that hard to me (I've not coded it
> up yet though).
> 
> The core code knows how far it got down the list reset_methods before
> it succeeded in resetting.  So...
> 
> Modify __pci_reset_function_locked() to return the index of the reset
> method that succeeded. Then pass that to pci_dev_restore().
> Finally push it into a reset_done2() that takes that as an extra
> parameter and the driver can see if it is FLR or SBR.

The reset types to distinguish per PCIe r6.2 sec 6.6 are
Conventional Reset and Function Level Reset.

Secondary Bus Reset is a Conventional Reset.

The spec subdivides Conventional Reset into Cold, Warm and Hot,
but that distinction is probably irrelevant for the kernel.

I think a more generalized (and therefore better) approach would be
to store the reset type the device has undergone in struct pci_dev,
right next to error_state, so that not just the ->reset_done()
callback benefits from the information.  The reset type applied has
consequences beyond the individual driver:  E.g. an FLR does not
affect CMA-SPDM session state, but a Conventional Reset does.
So there may be consumers of that information in the PCI core as well.

It's worth noting that we already have an enum pcie_reset_state in
<linux/pci.h> which distinguishes between deassert, warm and hot reset.
It is currently only used by PowerPC EEH to convey to the platform
which type of reset it should apply.  It might be possible to extend
the enum so that it can be used to store the reset type that *was*
applied to a device in struct pci_dev.

That all being said, checking for the *symptoms* of a Conventional Reset,
as Dave has done here, may actually be more robust than just relying on
what type of reset was applied.  E.g. after an FLR was handled, the device
may experience a DPC-induced Hot Reset.  By checking for the *symptoms*,
the driver may be able to catch that the device has undergone a
Conventional Reset immediately after an FLR.  Also, who knows if all
devices are well-behaved and retain their state during an FLR, as they
should per the spec?  Maybe there are broken devices which do not respect
that rule.  Checking for symptoms of a Conventional Reset would catch
those devices as well.

Thanks,

Lukas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux