On Thu, Mar 31, 2022 at 11:56:51AM +0200, Marek Szyprowski wrote: > Hi, > > On 30.03.2022 10:47, Maxime Ripard wrote: > > On Wed, Mar 30, 2022 at 10:06:13AM +0200, Marek Szyprowski wrote: > >> On 25.03.2022 17:11, Maxime Ripard wrote: > >>> While the current code will trigger a new clk_set_rate call whenever the > >>> rate boundaries are changed through clk_set_rate_range, this doesn't > >>> occur when clk_put() is called. > >>> > >>> However, this is essentially equivalent since, after clk_put() > >>> completes, those boundaries won't be enforced anymore. > >>> > >>> Let's add a call to clk_set_rate_range in clk_put to make sure those > >>> rate boundaries are dropped and the clock drivers can react. > >>> > >>> Let's also add a few tests to make sure this case is covered. > >>> > >>> Fixes: c80ac50cbb37 ("clk: Always set the rate on clk_set_range_rate") > >>> Signed-off-by: Maxime Ripard <maxime@xxxxxxxxxx> > >> This patch landed recently in linux-next 20220328 as commit 7dabfa2bc480 > >> ("clk: Drop the rate range on clk_put()"). Sadly it breaks booting of > >> the few of my test systems: Samsung ARM 32bit Exynos3250 based Rinato > >> board and all Amlogic Meson G12B/SM1 based boards (Odroid C4, N2, Khadas > >> VIM3/VIM3l). Rinato hangs always with the following oops: > >> > >> --->8--- > >> > >> Kernel panic - not syncing: MCT hangs after writing 4 (offset:0x420) > >> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc1-00014-g7dabfa2bc480 > >> #11551 > >> Hardware name: Samsung Exynos (Flattened Device Tree) > >> unwind_backtrace from show_stack+0x10/0x14 > >> show_stack from dump_stack_lvl+0x58/0x70 > >> dump_stack_lvl from panic+0x10c/0x328 > >> panic from exynos4_mct_tick_stop+0x0/0x2c > >> ---[ end Kernel panic - not syncing: MCT hangs after writing 4 > >> (offset:0x420) ]--- > >> > >> --->8--- > >> > >> Amlogic boards hang randomly during early userspace init, usually just > >> after loading the driver modules. > >> > >> Reverting $subject on top of linux-next fixes all those problems. > >> > >> I will try to analyze it a bit more and if possible provide some more > >> useful/meaning full logs later. > > I'm not sure what could go wrong there, but if you can figure out the > > clock, if it tries to set a new rate and what rate it is, it would be > > awesome :) > > So far I've noticed that the problem is caused by setting rate of some > clocks to zero. The following patch fixes my issues: > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index 32a9eaf35c6b..39cab08dbecb 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -2201,6 +2201,9 @@ static int clk_core_set_rate_nolock(struct > clk_core *core, > if (!core) > return 0; > > + if (req_rate == 0) > + return 0; > + > rate = clk_core_req_round_rate_nolock(core, req_rate); > > /* bail early if nothing to do */ > -- > > I will soon grab the call stack and relevant clock topology show how the > rate is set to zero. The most likely thing to happen is that clk_set_rate_range will call clk_core_set_rate_nolock with clk_core->req_rate, and at the time req_rate is at 0. And I'm a bit puzzled at this point, the only reason I could spot for req_rate to be at 0 is that it's an orphan clock that doesn't have its parent yet, but during userspace init I'd expect all the clocks to have been registered. Can you check if that clock is still orphan? Maxime