Re: [PATCH v2 3/3] clk: Drop the rate range on clk_put

Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> · Thu, 31 Mar 2022 11:56:51 +0200

Hi,

On 30.03.2022 10:47, Maxime Ripard wrote:
> On Wed, Mar 30, 2022 at 10:06:13AM +0200, Marek Szyprowski wrote:
>> On 25.03.2022 17:11, Maxime Ripard wrote:
>>> While the current code will trigger a new clk_set_rate call whenever the
>>> rate boundaries are changed through clk_set_rate_range, this doesn't
>>> occur when clk_put() is called.
>>>
>>> However, this is essentially equivalent since, after clk_put()
>>> completes, those boundaries won't be enforced anymore.
>>>
>>> Let's add a call to clk_set_rate_range in clk_put to make sure those
>>> rate boundaries are dropped and the clock drivers can react.
>>>
>>> Let's also add a few tests to make sure this case is covered.
>>>
>>> Fixes: c80ac50cbb37 ("clk: Always set the rate on clk_set_range_rate")
>>> Signed-off-by: Maxime Ripard <maxime@xxxxxxxxxx>
>> This patch landed recently in linux-next 20220328 as commit 7dabfa2bc480
>> ("clk: Drop the rate range on clk_put()"). Sadly it breaks booting of
>> the few of my test systems: Samsung ARM 32bit Exynos3250 based Rinato
>> board and all Amlogic Meson G12B/SM1 based boards (Odroid C4, N2, Khadas
>> VIM3/VIM3l). Rinato hangs always with the following oops:
>>
>> --->8---
>>
>> Kernel panic - not syncing: MCT hangs after writing 4 (offset:0x420)
>> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc1-00014-g7dabfa2bc480
>> #11551
>> Hardware name: Samsung Exynos (Flattened Device Tree)
>>    unwind_backtrace from show_stack+0x10/0x14
>>    show_stack from dump_stack_lvl+0x58/0x70
>>    dump_stack_lvl from panic+0x10c/0x328
>>    panic from exynos4_mct_tick_stop+0x0/0x2c
>> ---[ end Kernel panic - not syncing: MCT hangs after writing 4
>> (offset:0x420) ]---
>>
>> --->8---
>>
>> Amlogic boards hang randomly during early userspace init, usually just
>> after loading the driver modules.
>>
>> Reverting $subject on top of linux-next fixes all those problems.
>>
>> I will try to analyze it a bit more and if possible provide some more
>> useful/meaning full logs later.
> I'm not sure what could go wrong there, but if you can figure out the
> clock, if it tries to set a new rate and what rate it is, it would be
> awesome :)

So far I've noticed that the problem is caused by setting rate of some 
clocks to zero. The following patch fixes my issues:

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 32a9eaf35c6b..39cab08dbecb 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2201,6 +2201,9 @@ static int clk_core_set_rate_nolock(struct 
clk_core *core,
         if (!core)
                 return 0;

+       if (req_rate == 0)
+               return 0;
+
         rate = clk_core_req_round_rate_nolock(core, req_rate);

         /* bail early if nothing to do */
--

I will soon grab the call stack and relevant clock topology show how the 
rate is set to zero.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland