On Mon, Feb 10, 2025 at 10:31 AM Johan Hovold <johan@xxxxxxxxxx> wrote: > > On Sat, Feb 08, 2025 at 06:54:28PM +0100, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > > Commit 3775fc538f53 ("PM: sleep: core: Synchronize runtime PM status of > > parents and children") exposed an issue related to simple_pm_bus_pm_ops > > that uses pm_runtime_force_suspend() and pm_runtime_force_resume() as > > bus type PM callbacks for the noirq phases of system-wide suspend and > > Despite the name of the driver these are plain device driver PM > callbacks (not bus PM ops). Right, I was confused. BTW, I still think that it is invalid to use pm_runtime_force_suspend() and pm_runtime_force_resume() as system-wide PM callbacks without enabling runtime PM. The system-wide PM callbacks should be wrappers around these checking something (eg. dev_get_drvdata(dev)) other than dev->power.runtime_status. It is too fragile the current way. > > resume. > > > > The problem is that pm_runtime_force_suspend() does not distinguish > > runtime-suspended devices from devices for which runtime PM has never > > been enabled, so if it sees a device with runtime PM status set to > > RPM_ACTIVE, it will assume that runtime PM is enabled for that device > > and so it will attempt to suspend it with the help of its runtime PM > > callbacks which may not be ready for that. As it turns out, this > > causes simple_pm_bus_runtime_suspend() to crash due to a NULL pointer > > dereference. > > > > Another problem related to the above commit and simple_pm_bus_pm_ops is > > that setting runtime PM status of a device handled by the latter to > > RPM_ACTIVE will actually prevent it from being resumed because > > pm_runtime_force_resume() only resumes devices with runtime PM status > > set to RPM_SUSPENDED. > > > > To mitigate these issues, do not allow power.set_active to propagate > > beyond the parent of the device with DPM_FLAG_SMART_SUSPEND set that > > will need to be resumed, which should be a sufficient stop-gap for the > > time being, but they will need to be properly addressed in the future > > because in general during system-wide resume it is necessary to resume > > all devices in a dependency chain in which at least one device is going > > to be resumed. > > So this works as long as no parent of a device with > DPM_FLAG_SMART_SUSPEND set is using pm_runtime_force_resume(). > > This is the case in the systems I work on, but have you > verified that this is currently generally true? Not many drivers use > this flag, but it all depends on what their devices' parents' drivers > do: > > drivers/acpi/acpi_tad.c > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > drivers/i2c/busses/i2c-designware-platdrv.c > drivers/mfd/intel-lpss.c > drivers/pci/pcie/portdrv.c > drivers/pwm/pwm-lpss-platform.c > drivers/soundwire/intel_auxdevice.c > > Most of these look like ACPI drivers so nothing that would sit directly > on a simple-pm-bus at least. I'm not worried about these. Also, I have a patch series to get the power.set_active propagation into agreement with pm_runtime_force_suspend/resume(). I'll post it tomorrow.