On Wed, Sep 01, 2021 at 07:48:18AM +0200, Lukas Wunner wrote: > On Tue, Aug 31, 2021 at 04:58:01PM -0500, Bjorn Helgaas wrote: > > I just think it's > > conceivable that one might *want* portdrv to not claim an intermediate > > switch like that. > > It's possible to manually unbind portdrv from the device via sysfs > (because portdrv is a driver). In that case the port will not restore > config space upon an error-induced reset and any devices downstream > of the port will be inaccessible after the reset. > > That's the only possible way to screw this up I think. > And it requires deliberate, manual action. One *could* argue that's > not correct and the kernel shouldn't allow the incorrect behavior > in the first place. The behavior follows from portdrv being a driver, > instead of its functionality being baked into the PCI core. Right. I do think the overall PCI design would be significantly cleaner if the portdrv functionality were baked into the PCI core instead of being a driver. > > Or maybe you don't have portdrv configured at all. Do we still > > save/restore config space for suspend/resume of the switch? > > We do, because the PCI core takes care of that. E.g. on resume > from system sleep: > > pci_pm_resume_noirq() > pci_pm_default_resume_early() > pci_restore_state() > > However after an error-induced reset, it's the job of the device > driver's ->slot_reset() callback to restore config space. > That's a design decision that was made back in 2005 when EEH > was introduced. See Documentation/PCI/pci-error-recovery.rst: > > It is important for the platform to restore the PCI config space > to the "fresh poweron" state, rather than the "last state". After > a slot reset, the device driver will almost always use its standard > device initialization routines, and an unusual config space setup > may result in hung devices, kernel panics, or silent data corruption. > > I guess it would be possible to perform certain tasks such as > pci_restore_state() centrally in report_slot_reset() instead > (in drivers/pci/pcie/err.c) and alleviate each driver from doing that. > > One has to bear in mind though that a device may require specific > steps before pci_restore_state() is called. E.g. in the case of > portdrv, spurious hotplug DLLSC events need to be acknowledged > first: > > https://patchwork.ozlabs.org/project/linux-pci/patch/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@xxxxxxxxx/ As far as I know, pci_restore_state() only restores things specified by the PCIe spec. It doesn't restore any device-specific state, so I'm a little hesitant about inserting device-specific things in the middle of that flow. I know you're solving a real problem with that patch, and I don't have any better suggestions, but it will take me a while to assimilate this. Thanks for all your analysis; it is very helpful! > If portdrv isn't configured at all, AER and DPC support cannot be > configured either (because they depend on PCIEPORTBUS), and it's the > reset performed by AER or DPC which necessitates calling pci_restore_state(). > > If a port supports none of portdrv's services, portdrv still binds to > the port and is thus able to restore config space if a reset is performed > at a port further upstream. That's because of ... > > if (!capabilities) > return 0; > > ... in pcie_port_device_register(). So that should be working correctly. > > Thanks, > > Lukas