Re: [RFC/PATCH] clk: samsung: exynos5433: Fix NOIRQ suspend/resume support

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Mon, 1 Oct 2018 15:30:41 +0200

On 1 October 2018 at 14:34, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
> Hi Rafael and Ulf,
>
> I understand your position on using runtime PM during system
> suspend/resume, but
> I would like add a few more words on why I need it.
>
> On 2018-09-20 19:29, Ulf Hansson wrote:
>> [...]
>>>>> The unconditional calls to pm_runtime_enable and pm_runtime_enable
>>>>> in device_suspend_late and device_resume_early core functions were a bit
>>>>> surprising for me, because they make NOIRQ phase a special case from
>>>>> the perspective of runtime PM handling. Using LATE system sleep phase
>>>>> to call pm_runtime_enable/disable looks like a workaround for me, but right
>>>>> now I have no other idea how to ensure proper behavior of the Exynos5433
>>>>> CMU driver.
>>>>>
>>>>> The most strange thing is that I didn't observe this initially after
>>>>> switching to NOIRQ phase. The CMU device (FSYS_CMU) which is not under
>>>>> any power domain behaved correctly and was properly available for DWMMC
>>>>> device, which needs to enable clocks in its noirq_resume callback.
>>>>>
>>>>> However, in case of AUD_CMU and its clients (for example serial_3 device,
>>>>> which is a part of audio-subsytem), the pm_runtime_get_sync() called from
>>>>> clk_prepare() failed, because AUD_CMU device had still disabled runtime PM
>>>>> support. The only difference between FSYS_CMU and AUD_CMU, is the fact
>>>>> that AUD_CMU is under AUD power domain, while FSYS_CMU device has no power
>>>>> domain assigned (for completely other reasons FSYS power domains is not yet
>>>>> instantiated in exynos5433.dtsi).
>>>>>
>>>>> Ulf/Rafael: could You comment a bit on this issue? How it should be solved
>>>>> properly?
>>>> In general, this sounds like the classical problem of
>>>> suspending/resuming devices in a correct order. The long term solution
>>>> is to use device links to address this problem.
>>> Device links won't help in this case. This is not a problem of correct
>>> order of suspending/resuming devices. exynos5433 cmu device is already
>>> suspended after its client's and resumed before them. The problem here
>>> is using runtime PM in exynos5433 CMU driver during NOIRQ suspend/resume
>>> stage.
>>>
>> I see.
>>
>> However, I am guessing that one of the reason to why the client's are
>> also using the noirq phase is because of getting a the correct order.
>> The point is, the different level of PM callbacks (prepare, suspend,
>> suspend_late, suspend_noirq etc) aren't really the right solution to
>> get a correct order, even if that is what works for most cases.
>
> Nope, in both cases clients (dw-mmc-exynos and samsung serial/uart driver)
> use NOIRQ resume stage, because they want to play with interrupts (avoid
> interrupt storm) on resume. The only way to avoid NOIRQ phase in them is
> to manually mask their irqs on suspend and unmask after fixing interrupts
> related registers, but using NOIRQ was preferred solution.

I see.

>
> The main problem is that both clients needs to enable clocks to access
> their registers in NOIRQ resume, thus their clock provider has to be able
> to runtime resume as well. clk_prepare() will call pm_runtime_get_sync()
> on clock provider device, but first someone has to call pm_runtime_enable()
> on it, otherwise pm_runtimge_get_sync() fails.

In principle the sequence you are describing doesn't sound like a very
unique case, so yes, I fully agree that it's problematic.

So far, the solution to this problem has been to resume devices
unconditionally, instead of using runtime PM for example. In this
case, resuming would need to be done even as early as in the noirq
phase.

Allowing runtime PM to stay enabled, instead of disabling it from PM
core in the device suspend late phase, could be an option. However, in
such case in needs to be done on case by case basis, like an opt-in
solution, otherwise it may cause problems. Rafael, what do you think,
is this doable?

>
>> In principle, if all devices dependencies were hooked up correctly
>> with device links, we would in principle only need one level of
>> suspend/resume callbacks.
>>
>>>> However, as a simple fix, I would try to add a ->prepare() callback
>>>> and call pm_runtime_get_sync() from it and then in a new corresponding
>>>> ->complete() callback, call pm_runtime_put().
>>>>
>>>> This means that the device will stay runtime resumed until the
>>>> suspend_noirq phase. It would also means the pm_runtime_force_resume()
>>>> will resume the device at resume_noirq phase. Would that work?
>>> Yes, this will work, but it will unnecessarily wake all instances of
>>> exynos5433 cmu and their power domains during the system suspend/resume
>>> cycle, what I would like to avoid.
>> I guess what you are saying is that it's not at every suspend sequence
>> that the client's needs to runtime resume the exynos5433 cmu device -
>> and in those cases when not  needed, runtime resuming it becomes a
>> waste in regards to both time and energy. Right?
>
> Exactly. The other problem is that clock provider has to ensure that
> calling pm_runtime_get_sync() on its device during NOIRQ phase will
> not fail to let clients to properly prepare+enable their clocks.

Okay, thanks for confirming.

>
>> Anyway, I fully agree that this isn't very nice, but unfortunate
>> that's the best I can think of as a simple solution.
>
>> [...]
>>

Kind regards
Uffe