Re: [PATCH 1/2] iio: adc: rzg2l_adc: Drop devm_pm_runtime_enable()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 24.01.2025 20:41, Jonathan Cameron wrote:
> On Mon, 20 Jan 2025 15:59:02 +0100
> Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
> 
>> On Fri, 17 Jan 2025 at 16:52, Jonathan Cameron
>> <Jonathan.Cameron@xxxxxxxxxx> wrote:
>>>
>>> On Wed, 15 Jan 2025 16:29:15 +0100
>>> Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote:
>>>  
>>>> On Wed, 15 Jan 2025 at 14:37, Claudiu Beznea <claudiu.beznea@xxxxxxxxx> wrote:  
>>>>>
>>>>> Hi, Jonathan,
>>>>>
>>>>> Thank you for your input!
>>>>>
>>>>> On 11.01.2025 15:14, Jonathan Cameron wrote:  
>>>>>> On Mon, 6 Jan 2025 11:18:41 +0200
>>>>>> Claudiu Beznea <claudiu.beznea@xxxxxxxxx> wrote:
>>>>>>  
>>>>>>> Hi, Jonathan,
>>>>>>>
>>>>>>>
>>>>>>> On 04.01.2025 15:52, Jonathan Cameron wrote:  
>>>>>>>> On Fri,  3 Jan 2025 16:00:41 +0200
>>>>>>>> Claudiu <claudiu.beznea@xxxxxxxxx> wrote:
>>>>>>>>  
>>>>>>>>> From: Claudiu Beznea <claudiu.beznea.uj@xxxxxxxxxxxxxx>  
>>>>>>>> +CC Rafael and linux-pm
>>>>>>>>  
>>>>>>>>>
>>>>>>>>> On all systems where the rzg2l_adc driver is used, the ADC clocks are part
>>>>>>>>> of a PM domain. The code that implements the PM domains support is in
>>>>>>>>> drivers/clk/renesas/rzg2l-cpg.c, the functions of interest for this commit
>>>>>>>>> being rzg2l_cpg_attach_dev() and rzg2l_cpg_deattach_dev(). The PM
>>>>>>>>> domains support is registered with GENPD_FLAG_PM_CLK which, according to
>>>>>>>>> the documentation, instructs genpd to use the PM clk framework while
>>>>>>>>> powering on/off attached devices.
>>>>>>>>>
>>>>>>>>> During probe, the ADC device is attached to the PM domain
>>>>>>>>> controlling the ADC clocks. Similarly, during removal, the ADC device is
>>>>>>>>> detached from the PM domain.
>>>>>>>>>
>>>>>>>>> The detachment call stack is as follows:
>>>>>>>>>
>>>>>>>>> device_driver_detach() ->
>>>>>>>>>   device_release_driver_internal() ->
>>>>>>>>>     __device_release_driver() ->
>>>>>>>>>       device_remove() ->
>>>>>>>>>         platform_remove() ->
>>>>>>>>>           dev_pm_domain_detach()
>>>>>>>>>
>>>>>>>>> During driver unbind, after the ADC device is detached from its PM domain,
>>>>>>>>> the device_unbind_cleanup() function is called, which subsequently invokes
>>>>>>>>> devres_release_all(). This function handles devres resource cleanup.
>>>>>>>>>
>>>>>>>>> If runtime PM is enabled via devm_pm_runtime_enable(), the cleanup process
>>>>>>>>> triggers the action or reset function for disabling runtime PM. This
>>>>>>>>> function is pm_runtime_disable_action(), which leads to the following call
>>>>>>>>> stack of interest when called:
>>>>>>>>>
>>>>>>>>> pm_runtime_disable_action() ->
>>>>>>>>>   pm_runtime_dont_use_autosuspend() ->  
>>>>>>>>
>>>>>>>> So is the only real difference that in the code below you disable runtime pm
>>>>>>>> before autosuspend?  
>>>>>>>
>>>>>>> No, the difference is that now, the driver specific runtime PM APIs are not
>>>>>>> called anymore (through the pointed call stack) after the ADC was removed
>>>>>>> from it's PM domain.  
>>>>>>
>>>>>> By my reading they are only not called now because you turn autosuspend off
>>>>>> after disabling runtime PM.  
>>>>>
>>>>> Sorry, I wanted to say that the runtime PM APIs are not called anymore from
>>>>> devm_action_release(), though this call stack:
>>>>>
>>>>> [   24.801195] Call trace:
>>>>> [   24.803633]  rzg2l_adc_pm_runtime_suspend+0x18/0x54 (P)
>>>>> [   24.808847]  pm_generic_runtime_suspend+0x2c/0x44 (L)
>>>>> [   24.813887]  pm_generic_runtime_suspend+0x2c/0x44
>>>>> [   24.818580]  __rpm_callback+0x48/0x198
>>>>> [   24.822319]  rpm_callback+0x68/0x74
>>>>> [   24.825798]  rpm_suspend+0x100/0x578
>>>>> [   24.829362]  rpm_idle+0xd0/0x17c
>>>>> [   24.832582]  update_autosuspend+0x30/0xc4
>>>>> [   24.836580]  pm_runtime_disable_action+0x40/0x64
>>>>> [   24.841184]  devm_action_release+0x14/0x20
>>>>> [   24.845274]  devres_release_all+0xa0/0x100
>>>>> [   24.849361]  device_unbind_cleanup+0x18/0x60
>>>>>
>>>>> This is because I dropped the devm_pm_runtime_enable() which registers the
>>>>> pm_runtime_disable_action(), which is called at the time the
>>>>> device_unbind_cleanup() is called, which is called when the ADC is not
>>>>> anymore part of its PM domain.
>>>>>
>>>>> If I change the order in remove function proposed in this patch, thus do:
>>>>>
>>>>> +static void rzg2l_adc_remove(struct platform_device *pdev)
>>>>> +{
>>>>> +       struct device *dev = &pdev->dev;
>>>>> +
>>>>> +       pm_runtime_dont_use_autosuspend(dev);
>>>>> +       pm_runtime_disable(dev);
>>>>>  }
>>>>>
>>>>> nothing changes with the behavior of this patch. There will be no issue if
>>>>> the device is runtime suspended/resumed through the
>>>>> pm_runtime_dont_use_autosuspend() because at the time the
>>>>> rzg2l_adc_remove() is called the ADC is still part of the PM domain.
>>>>>
>>>>>
>>>>>  
>>>>>>  
>>>>>>>
>>>>>>>  
>>>>>>>>  Can you still do that with a devm callback just not
>>>>>>>> the standard one?  
>>>>>>>
>>>>>>> No. It doesn't matter if we call the standard devm callback or driver
>>>>>>> specific one. As long as it is devm it will impact us as long as the driver
>>>>>>> specific runtime PM APIs are called through devres_release_all() after
>>>>>>> dev_pm_domain_detach(). And at that time the PM domain may be off along
>>>>>>> with its clocks disabled.  
>>>>>>
>>>>>> As above, I think that this is only the case because of the reordering
>>>>>> of those two calls, not something more fundamental.  
>>>>>
>>>>> I tried having a local devm function (the following diff applied with this
>>>>> patch reverted) identical with pm_runtime_disable_action():
>>>>>
>>>>> diff --git a/drivers/iio/adc/rzg2l_adc.c b/drivers/iio/adc/rzg2l_adc.c
>>>>> index 22a581c894f8..459cc9c67eec 100644
>>>>> --- a/drivers/iio/adc/rzg2l_adc.c
>>>>> +++ b/drivers/iio/adc/rzg2l_adc.c
>>>>> @@ -423,6 +423,12 @@ static int rzg2l_adc_hw_init(struct device *dev,
>>>>> struct rzg2l_adc *adc)
>>>>>         return ret;
>>>>>  }
>>>>>
>>>>> +static void rzg2l_pm_runtime_disable(void *data)
>>>>> +{
>>>>> +       pm_runtime_dont_use_autosuspend(data);
>>>>> +       pm_runtime_disable(data);
>>>>> +}
>>>>> +
>>>>>  static int rzg2l_adc_probe(struct platform_device *pdev)
>>>>>  {
>>>>>         struct device *dev = &pdev->dev;
>>>>> @@ -463,7 +469,9 @@ static int rzg2l_adc_probe(struct platform_device *pdev)
>>>>>
>>>>>         pm_runtime_set_autosuspend_delay(dev, 300);
>>>>>         pm_runtime_use_autosuspend(dev);
>>>>> -       ret = devm_pm_runtime_enable(dev);
>>>>> +       pm_runtime_enable(dev);
>>>>> +
>>>>> +       ret = devm_add_action_or_reset(dev, rzg2l_pm_runtime_disable, dev);
>>>>>         if (ret)
>>>>>                 return ret;
>>>>>
>>>>> With this the issue is still reproducible.
>>>>>
>>>>> However, changing the order of functions in rzg2l_pm_runtime_disable() and
>>>>> having it like:
>>>>>
>>>>> +static void rzg2l_pm_runtime_disable(void *data)
>>>>> +{
>>>>> +       pm_runtime_disable(data);
>>>>> +       pm_runtime_dont_use_autosuspend(data);
>>>>> +}
>>>>>
>>>>>
>>>>> leads to no failure when doing unbind/bind.
>>>>>
>>>>> However, I see the pm_runtime_disable() can still call rpm_resume() under
>>>>> certain conditions. It can still lead to failures if it is called after the
>>>>> device was remove from its PM domain.
>>>>>  
>>>>>>
>>>>>> In driver remove flow, device_unbind_cleanup9() is called
>>>>>> just after device_remove() which is calling the dev->driver_remove()
>>>>>> callback. There are no runtime pm related calls in between that I can see.  
>>>>>
>>>>> On my side the device_remove() is calling dev->bus->remove() which is
>>>>> platform_remove(), which calls the dev_pm_domain_detach(). The
>>>>> dev_pm_domain_detach() detaches the ADC from it's PM domain. Because of
>>>>> this, accessing now the ADC registers after a runtime resume leads to
>>>>> failures pointed in this patch (as of my investigation) (as the ADC is not
>>>>> anymore part of its PM domain and its PM domain is not started anymore
>>>>> though runtime PM APIs).
>>>>>
>>>>> A similar issue was found while I was adding thermal support for RZ/G3S,
>>>>> explained in
>>>>> https://lore.kernel.org/all/20250103163805.1775705-3-claudiu.beznea.uj@xxxxxxxxxxxxxx
>>>>>
>>>>>
>>>>> Jonathan, Rafael, Ulf, all,
>>>>>
>>>>> Do consider OK to change the order in pm_runtime_disable_action() to get
>>>>> rid of these issues, e.g.:
>>>>>
>>>>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
>>>>> index 2ee45841486b..f27d311d2619 100644
>>>>> --- a/drivers/base/power/runtime.c
>>>>> +++ b/drivers/base/power/runtime.c
>>>>> @@ -1547,8 +1547,8 @@ EXPORT_SYMBOL_GPL(pm_runtime_enable);
>>>>>
>>>>>  static void pm_runtime_disable_action(void *data)
>>>>>  {
>>>>> -       pm_runtime_dont_use_autosuspend(data);
>>>>>         pm_runtime_disable(data);
>>>>> +       pm_runtime_dont_use_autosuspend(data);
>>>>>  }
>>>>>
>>>>> though I see a rpm_resume() call is still possible though pm_runtime_disable().  
>>>>
>>>> I am still worried about keeping the device runtime enabled during a
>>>> window when we have turned off all resources for the device. Typically
>>>> we want to leave the device in a low power state after unbind.
>>>>
>>>> That said, I would rather just drop the devm_pm_runtime_enable() API
>>>> altogether and convert all users of it into
>>>> pm_runtime_enable|disable(), similar to what your patch does.  
>>>
>>> That is making a mess of a lot of automated cleanup for a strange
>>> runtime pm related path.  This is pain a driver should not have
>>> to deal with, though I'm not clear what the right solution is!
>>>
>>> Key is that drivers should not mix devm managed cleanup and not, so
>>> that means that anything that happens after runtime pm is enabled
>>> has to be torn down manually.  One solution to this might be to
>>> always enable it late assuming that is safe to do so there is
>>> never anything else done after it in the probe path of a driver.  
>>
>> The problem is that runtime PM isn't really comparable to other
>> resources that we are managing through devm* functions.
>>
>> Enabling runtime PM for a device changes the behaviour for how
>> power-mgmt is handled for the device. Enabling/disabling of runtime PM
>> really needs to be explicitly controlled by the driver for the device.
> 
> I'm sorry to say I'm not yet convinced.
> 
> Devm callbacks are explicitly registered by the driver so that they
> are unwound in a specific order.  Many other parts of driver
> registration rely on this ordering.  This does not seem different
> for runtime PM than anything else.
> 
> Superficially the issue here looks to me to be that a non devm
> cleanup is inserted by the bus->remove() callback into drivers
> that are otherwise relying on explicit ordering provided
> by managed cleanup.
> 
> I appreciate there may be no trivial solution.
> 
> Maybe we can minimize that impact by always doing runtime pm last
> in any probe() function?  Can that work here?

It will not work, AFAICT. E.g., some hardware need initialization before,
e.g., registering a device for it in the proper subsystem. Runtime PM may
help with enabling, e.g., clocks, for that hardware initialization to work.

> 
> 
>>
>> If there are cleanups to be made for runtime PM, beyond disabling
>> runtime PM, we could instead consider adding that code in
>> pm_runtime_reinit().
> 
> I'm not familiar enough to comment on this option beyond it being
> after devres_release_all() so, maybe?  It seems superficially
> a better point in the sequence.

I only need disabling runtime PM in my usecase.

One thing that I've experimented with is to drop the dev_pm_domain_detach()
call from platform_remove() and add it in device_unbind_cleanup(). I
checked with this diff:

1/

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index f0e4b4aba885..e658c2d642be 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -544,6 +544,7 @@ static ssize_t state_synced_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(state_synced);

+extern void dev_pm_domain_detach(struct device *dev, bool power_off);
 static void device_unbind_cleanup(struct device *dev)
 {
        devres_release_all(dev);
@@ -554,6 +555,7 @@ static void device_unbind_cleanup(struct device *dev)
        dev_set_drvdata(dev, NULL);
        if (dev->pm_domain && dev->pm_domain->dismiss)
                dev->pm_domain->dismiss(dev);
+       dev_pm_domain_detach(dev, true);
        pm_runtime_reinit(dev);
        dev_pm_set_driver_flags(dev, 0);
 }
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 6f2a33722c52..bf2f8d39d184 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1422,7 +1422,6 @@ static void platform_remove(struct device *_dev)

        if (drv->remove)
                drv->remove(dev);
-       dev_pm_domain_detach(_dev, true);
 }

And also, with this one:

2/

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index f0e4b4aba885..216d765fbfcd 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -544,9 +544,11 @@ static ssize_t state_synced_show(struct device *dev,
 }
 static DEVICE_ATTR_RW(state_synced);

+extern void dev_pm_domain_detach(struct device *dev, bool power_off);
 static void device_unbind_cleanup(struct device *dev)
 {
        devres_release_all(dev);
+       dev_pm_domain_detach(dev, true);
        arch_teardown_dma_ops(dev);
        kfree(dev->dma_range_map);
        dev->dma_range_map = NULL;
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 6f2a33722c52..bf2f8d39d184 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1422,7 +1422,6 @@ static void platform_remove(struct device *_dev)

        if (drv->remove)
                drv->remove(dev);
-       dev_pm_domain_detach(_dev, true);
 }

Both of these work on my side (when doing continuous unbind/bind) but I'm
not sure what are the implications of it. If all good, maybe we can add a
devm helper for dev_pm_domain_attach() and use it in platform_probe(). With
this it will be guaranteed that the devm_pm_domain_detach() will be the
last cleanup helper that will be called on remove for the platform bus.
What do you think?

Thank you,
Claudiu

> 
> Jonathan
>>
>> [...]
>>
>> Kind regards
>> Uffe
> 





[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux