Hi Matthias Sir, On Mon, 2023-02-06 at 13:09 +0100, Matthias Brugger wrote: > > On 06/02/2023 03:01, Roger Lu (陸瑞傑) wrote: > > Hi Matthias Sir, > > > > > > On Thu, 2023-02-02 at 11:29 +0100, Matthias Brugger wrote: > > > 你好, > > > > I got shock and thought someone used your name to reply. However, > > your email account helps me clear my mind. Haha.. Nice and warm to see > > mandarin > > on patchwork. It's so fresh and exciting :-). > > > > 谢谢。 I'm learning mainland Chinese for a few month now, I also learned that > you > use different symbols in Taiwan, which I don't know. 对不起。 Ha. Both symbols are welcome to me. :-) > > > > > > > On 01/02/2023 13:28, Roger Lu (陸瑞傑) wrote: > > > > Hi Matthias Sir, > > > > > > > > On Tue, 2023-01-31 at 14:19 +0100, Matthias Brugger wrote: > > > > > > > > > > On 11/01/2023 08:45, Roger Lu wrote: > > > > > > In MediaTek HW design, svs and thermal both use the same clk source. > > > > > > It means that svs clk reference count from CCF includes thermal > > > > > > control > > > > > > counts. That makes svs driver confuse whether it disabled svs's main > > > > > > clk > > > > > > or not from CCF's perspective and lead to turn off their shared clk > > > > > > unexpectedly. Therefore, we add svs clk control APIs to make sure > > > > > > svs's > > > > > > main clk is controlled well by svs driver itself. > > > > > > > > > > > > Here is a NG example. Rely on CCF's reference count and cause > > > > > > problem. > > > > > > > > > > > > thermal probe (clk ref = 1) > > > > > > -> svs probe (clk ref = 2) > > > > > > -> svs suspend (clk ref = 1) > > > > > > -> thermal suspend (clk ref = 0) > > > > > > -> thermal resume (clk ref = 1) > > > > > > -> svs resume (encounter error, clk ref = 1) > > > > > > -> svs suspend (clk ref = 0) > > > > > > -> thermal suspend (Fail here, thermal HW control w/o clk) > > > > > > > > > > > > Fixes: a825d72f74a3 ("soc: mediatek: fix missing > > > > > > clk_disable_unprepare() > > > > > > on > > > > > > err in svs_resume()") > > > > > > Signed-off-by: Roger Lu <roger.lu@xxxxxxxxxxxx> > > > > > > > > > > That looks wrong. Although I don't out of my mind, there should be a > > > > > way > > > > > to > > > > > tell > > > > > the clock framework that this clock is shared between several devices. > > > > > > > > > > I wonder if using clk_enable and clk_disable in svs_resume/suspend > > > > > wouldn't > > > > > be > > > > > enough. > > > > > > > > Oh yes, Common Clock Framework (CCF) knows the clock shared between > > > > several > > > > devices and maintains clock "on/off" by reference count. > > > > > > > > > > The thing is if you use clk_prepare_enable then the clock framework > > > check's > > > if > > > the clock is already prepared, which could happen like you described in > > > the > > > svs_resume (encount error) case in the commit message. The question is, > > > can't > > > we > > > just use clk_enable and clk_disable in resume/suspend and only prepare the > > > clock > > > in the probe function? > > > > We'll think if this can fix the problem. Thanks for the advice very much. > > > > > > > > > We concern how to stop running svs_suspend() when svs clk is already > > > > disabled by > > > > svs_resume(). Take an example as below, if we refers to > > > > __clk_is_enabled() > > > > result for knowing svs clk status, it will return "true" all the time > > > > because > > > > thermal clk is still on. This causes the problem mentioned in commit > > > > message. > > > > > > > > > > I would expect that the kernel takes care that we can't enter a resume > > > path > > > for > > > a device before the suspend path has finished. Honestly I don't really > > > understand the problem here. It seems something different then what you > > > described in the commit message. > > > > > > Please help me understand better. > > > > I see. This patch title needs to be changed to "avoid turning off svs clk > > twice > > unexpectedly" for pointing out the problem precisely. We saw a loophole that > > svs > > clk might be turned off in svs_resume() first and in svs_suspend() again > > without > > enabling svs clk during these the process. Therefore, we try to fix it by > > this > > patch. Below is our thinking process to explain how we got here. > > > > 1. (abandoned) We add __clk_is_enabled() check in svs_suspend() to prevent > > svs > > clk from being turned off twice when svs_resume() turned off svs clk in the > > error-handling process. Nonetheless, we met the NG case in the commit > > message. > > 2. (current patch) We add svs clk control hint to understand if we need to > > run > > svs_suspend() or not if svs_resume() turned off svs clk before. > > > > Did you had a look on the dev_pm_ops? Maybe we can use suspend_late, > resume_early to make sure there is no race condition. I wonder also if we > can't > make sure that this does not happen using device links. Sorry, I can't give > better guidance on how to use this technologies, but I have the feeling we > can > fix this with existing infrastructure. No, we didn't look on dev_pm_ops and it seems like another way to fix this issue with __clk_is_enabled() check in svs_suspend(). Thanks for the advice again. we'll keep looking for any possible answers to this issue. Sincerely, Roger Lu.