Hi, Daniel, Jonathan, On 15.02.2025 15:51, Claudiu Beznea wrote: > Hi, Greg, > > On 15.02.2025 15:25, Greg KH wrote: >> On Sat, Feb 15, 2025 at 03:08:49PM +0200, Claudiu wrote: >>> From: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx> >>> >>> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}), >>> clocks are managed through PM domains. These PM domains, registered on >>> behalf of the clock controller driver, are configured with >>> GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the >>> clocks are enabled/disabled using runtime PM APIs. The power domains may >>> also have power_on/power_off support implemented. After the device PM >>> domain is powered off any CPU accesses to these domains leads to system >>> aborts. >>> >>> During probe, devices are attached to the PM domain controlling their >>> clocks and power. Similarly, during removal, devices are detached from the >>> PM domain. >>> >>> The detachment call stack is as follows: >>> >>> device_driver_detach() -> >>> device_release_driver_internal() -> >>> __device_release_driver() -> >>> device_remove() -> >>> platform_remove() -> >>> dev_pm_domain_detach() >>> >>> During driver unbind, after the device is detached from its PM domain, >>> the device_unbind_cleanup() function is called, which subsequently invokes >>> devres_release_all(). This function handles devres resource cleanup. >>> >>> If runtime PM is enabled in driver probe via devm_pm_runtime_enable(), the >>> cleanup process triggers the action or reset function for disabling runtime >>> PM. This function is pm_runtime_disable_action(), which leads to the >>> following call stack of interest when called: >>> >>> pm_runtime_disable_action() -> >>> pm_runtime_dont_use_autosuspend() -> >>> __pm_runtime_use_autosuspend() -> >>> update_autosuspend() -> >>> rpm_idle() >>> >>> The rpm_idle() function attempts to resume the device at runtime. However, >>> at the point it is called, the device is no longer part of a PM domain >>> (which manages clocks and power states). If the driver implements its own >>> runtime PM APIs for specific functionalities - such as the rzg2l_adc >>> driver - while also relying on the power domain subsystem for power >>> management, rpm_idle() will invoke the driver's runtime PM API. However, >>> since the device is no longer part of a PM domain at this point, the PM >>> domain's runtime PM APIs will not be called. This leads to system aborts on >>> Renesas SoCs. >>> >>> Another identified case is when a subsystem performs various cleanups >>> using device_unbind_cleanup(), calling driver-specific APIs in the process. >>> A known example is the thermal subsystem, which may call driver-specific >>> APIs to disable the thermal device. The relevant call stack in this case >>> is: >>> >>> device_driver_detach() -> >>> device_release_driver_internal() -> >>> device_unbind_cleanup() -> >>> devres_release_all() -> >>> devm_thermal_of_zone_release() -> >>> thermal_zone_device_disable() -> >>> thermal_zone_device_set_mode() -> >>> struct thermal_zone_device_ops::change_mode() >>> >>> At the moment the driver-specific change_mode() API is called, the device >>> is no longer part of its PM domain. Accessing its registers without proper >>> power management leads to system aborts. >>> >>> Open a devres group before calling the driver probe, and close it >>> immediately after the driver remove function is called and before >>> dev_pm_domain_detach(). This ensures that driver-specific devm actions or >>> reset functions are executed immediately after the driver remove function >>> completes. Additionally, it prevents driver-specific runtime PM APIs from >>> being called when the device is no longer part of its power domain. >>> >>> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx> >>> --- >>> >>> Hi, >>> >>> Although Ulf gave its green light for the approaches on both IIO [1], >>> [2] and thermal subsystems [3], Jonathan considered unacceptable the >>> approaches in [1], [2] as he considered it may lead to dificult to >>> maintain code and code opened to subtle bugs (due to the potential of >>> mixing devres and non-devres calls). He pointed out a similar approach >>> that was done for the I2C bus [4], [5]. >>> >>> As the discussions in [1], [2] stopped w/o a clear conclusion, this >>> patch tries to revive it by proposing a similar approach that was done >>> for the I2C bus. >>> >>> Please let me know you input. >> >> I'm with Jonathan here, the devres stuff is getting crazy here and you >> have drivers mixing them and side affects happening and lots of >> confusion. Your change here is only going to make it even more >> confusing, and shouldn't actually solve it for other busses (i.e. what >> about iio devices NOT on the platform bus?) > > You're right, other busses will still have this problem. > >> >> Why can't your individual driver handle this instead? > > Initially I tried it at the driver level by using non-devres PM runtime > enable API but wasn't considered OK by all parties. > > I haven't thought about having devres_open_group()/devres_close_group() in > the driver itself but it should work. Are you OK with having the devres_open_group()/devres_close_group() in the currently known affected drivers (drivers/iio/adc/rzg2l_adc.c and the proposed drivers/thermal/renesas/rzg3s_thermal.c [1]) ? Thank you, Claudiu [1] https://lore.kernel.org/all/20250103163805.1775705-5-claudiu.beznea.uj@xxxxxxxxxxxxxx > > Thank you, > Claudiu > >> >> thanks, >> >> greg k-h >