On Wed, 15 Jan 2025 at 14:37, Claudiu Beznea <claudiu.beznea@xxxxxxxxx> wrote: > > Hi, Jonathan, > > Thank you for your input! > > On 11.01.2025 15:14, Jonathan Cameron wrote: > > On Mon, 6 Jan 2025 11:18:41 +0200 > > Claudiu Beznea <claudiu.beznea@xxxxxxxxx> wrote: > > > >> Hi, Jonathan, > >> > >> > >> On 04.01.2025 15:52, Jonathan Cameron wrote: > >>> On Fri, 3 Jan 2025 16:00:41 +0200 > >>> Claudiu <claudiu.beznea@xxxxxxxxx> wrote: > >>> > >>>> From: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx> > >>> +CC Rafael and linux-pm > >>> > >>>> > >>>> On all systems where the rzg2l_adc driver is used, the ADC clocks are part > >>>> of a PM domain. The code that implements the PM domains support is in > >>>> drivers/clk/renesas/rzg2l-cpg.c, the functions of interest for this commit > >>>> being rzg2l_cpg_attach_dev() and rzg2l_cpg_deattach_dev(). The PM > >>>> domains support is registered with GENPD_FLAG_PM_CLK which, according to > >>>> the documentation, instructs genpd to use the PM clk framework while > >>>> powering on/off attached devices. > >>>> > >>>> During probe, the ADC device is attached to the PM domain > >>>> controlling the ADC clocks. Similarly, during removal, the ADC device is > >>>> detached from the PM domain. > >>>> > >>>> The detachment call stack is as follows: > >>>> > >>>> device_driver_detach() -> > >>>> device_release_driver_internal() -> > >>>> __device_release_driver() -> > >>>> device_remove() -> > >>>> platform_remove() -> > >>>> dev_pm_domain_detach() > >>>> > >>>> During driver unbind, after the ADC device is detached from its PM domain, > >>>> the device_unbind_cleanup() function is called, which subsequently invokes > >>>> devres_release_all(). This function handles devres resource cleanup. > >>>> > >>>> If runtime PM is enabled via devm_pm_runtime_enable(), the cleanup process > >>>> triggers the action or reset function for disabling runtime PM. This > >>>> function is pm_runtime_disable_action(), which leads to the following call > >>>> stack of interest when called: > >>>> > >>>> pm_runtime_disable_action() -> > >>>> pm_runtime_dont_use_autosuspend() -> > >>> > >>> So is the only real difference that in the code below you disable runtime pm > >>> before autosuspend? > >> > >> No, the difference is that now, the driver specific runtime PM APIs are not > >> called anymore (through the pointed call stack) after the ADC was removed > >> from it's PM domain. > > > > By my reading they are only not called now because you turn autosuspend off > > after disabling runtime PM. > > Sorry, I wanted to say that the runtime PM APIs are not called anymore from > devm_action_release(), though this call stack: > > [ 24.801195] Call trace: > [ 24.803633] rzg2l_adc_pm_runtime_suspend+0x18/0x54 (P) > [ 24.808847] pm_generic_runtime_suspend+0x2c/0x44 (L) > [ 24.813887] pm_generic_runtime_suspend+0x2c/0x44 > [ 24.818580] __rpm_callback+0x48/0x198 > [ 24.822319] rpm_callback+0x68/0x74 > [ 24.825798] rpm_suspend+0x100/0x578 > [ 24.829362] rpm_idle+0xd0/0x17c > [ 24.832582] update_autosuspend+0x30/0xc4 > [ 24.836580] pm_runtime_disable_action+0x40/0x64 > [ 24.841184] devm_action_release+0x14/0x20 > [ 24.845274] devres_release_all+0xa0/0x100 > [ 24.849361] device_unbind_cleanup+0x18/0x60 > > This is because I dropped the devm_pm_runtime_enable() which registers the > pm_runtime_disable_action(), which is called at the time the > device_unbind_cleanup() is called, which is called when the ADC is not > anymore part of its PM domain. > > If I change the order in remove function proposed in this patch, thus do: > > +static void rzg2l_adc_remove(struct platform_device *pdev) > +{ > + struct device *dev = &pdev->dev; > + > + pm_runtime_dont_use_autosuspend(dev); > + pm_runtime_disable(dev); > } > > nothing changes with the behavior of this patch. There will be no issue if > the device is runtime suspended/resumed through the > pm_runtime_dont_use_autosuspend() because at the time the > rzg2l_adc_remove() is called the ADC is still part of the PM domain. > > > > > > >> > >> > >>> Can you still do that with a devm callback just not > >>> the standard one? > >> > >> No. It doesn't matter if we call the standard devm callback or driver > >> specific one. As long as it is devm it will impact us as long as the driver > >> specific runtime PM APIs are called through devres_release_all() after > >> dev_pm_domain_detach(). And at that time the PM domain may be off along > >> with its clocks disabled. > > > > As above, I think that this is only the case because of the reordering > > of those two calls, not something more fundamental. > > I tried having a local devm function (the following diff applied with this > patch reverted) identical with pm_runtime_disable_action(): > > diff --git a/drivers/iio/adc/rzg2l_adc.c b/drivers/iio/adc/rzg2l_adc.c > index 22a581c894f8..459cc9c67eec 100644 > --- a/drivers/iio/adc/rzg2l_adc.c > +++ b/drivers/iio/adc/rzg2l_adc.c > @@ -423,6 +423,12 @@ static int rzg2l_adc_hw_init(struct device *dev, > struct rzg2l_adc *adc) > return ret; > } > > +static void rzg2l_pm_runtime_disable(void *data) > +{ > + pm_runtime_dont_use_autosuspend(data); > + pm_runtime_disable(data); > +} > + > static int rzg2l_adc_probe(struct platform_device *pdev) > { > struct device *dev = &pdev->dev; > @@ -463,7 +469,9 @@ static int rzg2l_adc_probe(struct platform_device *pdev) > > pm_runtime_set_autosuspend_delay(dev, 300); > pm_runtime_use_autosuspend(dev); > - ret = devm_pm_runtime_enable(dev); > + pm_runtime_enable(dev); > + > + ret = devm_add_action_or_reset(dev, rzg2l_pm_runtime_disable, dev); > if (ret) > return ret; > > With this the issue is still reproducible. > > However, changing the order of functions in rzg2l_pm_runtime_disable() and > having it like: > > +static void rzg2l_pm_runtime_disable(void *data) > +{ > + pm_runtime_disable(data); > + pm_runtime_dont_use_autosuspend(data); > +} > > > leads to no failure when doing unbind/bind. > > However, I see the pm_runtime_disable() can still call rpm_resume() under > certain conditions. It can still lead to failures if it is called after the > device was remove from its PM domain. > > > > > In driver remove flow, device_unbind_cleanup9() is called > > just after device_remove() which is calling the dev->driver_remove() > > callback. There are no runtime pm related calls in between that I can see. > > On my side the device_remove() is calling dev->bus->remove() which is > platform_remove(), which calls the dev_pm_domain_detach(). The > dev_pm_domain_detach() detaches the ADC from it's PM domain. Because of > this, accessing now the ADC registers after a runtime resume leads to > failures pointed in this patch (as of my investigation) (as the ADC is not > anymore part of its PM domain and its PM domain is not started anymore > though runtime PM APIs). > > A similar issue was found while I was adding thermal support for RZ/G3S, > explained in > https://lore.kernel.org/all/20250103163805.1775705-3-claudiu.beznea.uj@xxxxxxxxxxxxxx > > > Jonathan, Rafael, Ulf, all, > > Do consider OK to change the order in pm_runtime_disable_action() to get > rid of these issues, e.g.: > > diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c > index 2ee45841486b..f27d311d2619 100644 > --- a/drivers/base/power/runtime.c > +++ b/drivers/base/power/runtime.c > @@ -1547,8 +1547,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_enable); > > static void pm_runtime_disable_action(void *data) > { > - pm_runtime_dont_use_autosuspend(data); > pm_runtime_disable(data); > + pm_runtime_dont_use_autosuspend(data); > } > > though I see a rpm_resume() call is still possible though pm_runtime_disable(). I am still worried about keeping the device runtime enabled during a window when we have turned off all resources for the device. Typically we want to leave the device in a low power state after unbind. That said, I would rather just drop the devm_pm_runtime_enable() API altogether and convert all users of it into pm_runtime_enable|disable(), similar to what your patch does. > > Thank you, > Claudiu [...] Kind regards Uffe