On Tue, Feb 06, 2024 at 04:37:36PM +0100, Stanislaw Gruszka wrote: > > > > If this is a real possibility (I mean, device in a low-power state and > > > > in an error state at the same time), it would be better to call > > > > __pm_runtime_disable(dev, false) instead of pm_runtime_get_sync(), as > > > > that would prevent runtime PM from changing the device state. > > > > > > __pm_runtime_disable(dev, false) does not work as reliable in my > > > test as pm_runtime_get_sync(), the > > > > > > igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible > > > > > > message disappears, but sill have: > > > > > > igc 0000:02:00.0: not ready 65535ms after bus reset; giving up > > > pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25) > > > pcieport 0000:00:1c.2: AER: subordinate device reset failed > > > pcieport 0000:00:1c.2: AER: device recovery fail > > > > But what exactly do you do? > > > > (1) __pm_runtime_disable(dev, false) > > (2) Check power state > > (a) If D0 (and device runtime-active), proceed > > (b) If > D0, remove power (if possible) and put into D0 > > > > or something else? > > I just did point (1), did not check power state (2). > I tested below patch with replaced: > > pm_runtime_get_sync -> __pm_runtime_disable(false) > pm_runtime_put -> pm_runtime_enable() > > I could try to test with (1)+(2), but now sure how do do step (2b), > by: > > pci_set_power_state(D3cold) > pci_set_power_state(D0) The problematic case is indeed when after __pm_runtime_disable(), device is in D3hot state. In such case we need to do the same operations as pci_pm_runtime_resume() does, otherwise AER code is not able to work. I think, just doing pm_runtime_get_sync() is better. While I'm able to reproduce D3hot & error state using aer_inject on the same smae device, more practical case is recovery running on all devices connected to a port (else case in pcie_do_recovery type == PCIE_EXP_TYPE* check). On such case some devices can be suspend, and from AER code comments I conclude they have to be reset. > +static int pci_pm_runtime_put(struct pci_dev *pdev, void *data) > +{ > + pm_runtime_put(&pdev->dev); > + return 0; > +} > + <snip> > + > + pci_walk_bridge(bridge, pci_pm_runtime_put, NULL); This can happen after driver is unbind from device. I had concern about that, but after drivers/base/ code examination, seems to be fine do do pm_runtime_put() after unbind. Regards Stanislaw