Quoting Stephen Boyd (2023-11-08 20:20:50) > > Here's what's happening without any changes > > 1. mdp and rot are sourcing from disp_cc_pll0 at driver probe, disp_cc_pll0 is enabled > 2. mdp is configured to restore on XO > 3. rot is configured to restore on XO > 4. mdp is switched to XO on clk_enable() > 5. mdp is switched to XO on clk_disable() > 6. disp_cc_pll0 is left untouched > 7. rot branch clk is disabled during late init (disp_cc_mdss_rot_clk) > 8. rot rcg is still enabled > 9. disp_cc_pll0 is disabled during late init > 10. mdp is switched to XO on clk_enable() > 11. mdp branch is stuck off > > I could see the problem happening if mdp branch and rot branch were > swapped. When we enable/disable mdp branch it will actually > enable/disable the rot rcg because feedback is going there. During boot > this is fine because disp_cc_pll0 is enabled. Leaving it enabled > throughout boot hides the problem because enabling mdp branch is > actually enabling rot branch. Once we get to late init, we disable rot > branch (actually mdp branch). The mdp rcg is already parked, so this > should be OK. The problem is the mdp branch (actually rot branch) has > been disabled, while the rot rcg is still sourcing disp_cc_pll0. We'll > disable the pll during late init, and this will wedge the clk off. When > we go to turn on the mdp branch (actually the rot branch) after late > init, we'll try to turn on the branch and the rot rcg will be parented > to the disp_cc_pll0 still (because we never moved it off). > > This patch fixes the problem for me. Obviously, things are still bad > though because we don't report the proper parent to the framework. We > can do that pretty easily by parking at clk registration time though and > also by saving the config register. Argh! I forgot to put back disabling unused clks. This patch is garbage and doesn't actually work. I started directly writing things with devmem and saw that the branch bits are correct. The status bit in the rcg changes when the corresponding branch is enabled. It seems that rot must be parked on XO, otherwise mdp branch will fail to enable after late init. I was hoping these two clks weren't related, but they must be related somehow. I've parked rot on XO right after disp_cc_pll0 is disabled and that also works. My guess is the GDSC is restoring rcg registers, and that restoring requires the source clk to be running (maybe they hardcoded that requirement in the hardware). In the case of rot, the source clk is disp_cc_pll0, which is always enabled during boot. When I hack it so that disp_cc_pll0 is disabled at the end of dispcc probe, and rot is the only clk still sourcing from it, nothing is busted until the gdsc powers off, saves state, and then powers on again. That's the case where I reported the stuck clk warning happens early, before late init. Once the gdsc powers on the second time, it must wedge the rcg hardware and affect mdp even though mdp was parked on XO. Fun! This must be why qcom didn't want to have dirty registers. It must confuse the gdsc hardware and cause it to restore the clk to the parent that isn't enabled. BTW, I tried changing the rcg source for rot after the mdp branch is stuck off, and the rot rcg goes from reporting root off, to reporting root is on, which is weird. Maybe that's just showing the rot rcg is in some sort of wedged state as well. So long story short, I think I understand the problem now. The gdsc is saving and restoring the register contents, and enabling the rcg clks when it does so. If that runs into some stuck rcg problem, it will wedge the clks in weird ways. The only safe thing to do is make sure that when the gdsc is turned off, all the rcgs are parked on sources that are going to be enabled when the gdsc powers on the next time. For now, we've decided that source is XO, because it is simple. It seems like that means we need to park all shared rcg clks at registration time. Or we need to teach the clks about their gdsc and switch rcgs over to the safe source when disabling the gdsc. Parking at init is easy, we call clk_rcg2_shared_disable() from the init routine, and that parks the clk and stashes the config register. I just don't know if that causes some sort of problem for bootsplash logic. The mdp clk could be running quite fast, and we'll basically force it over to XO immediately. It may be better to teach the gdsc code to park the rcgs when disabling the gdsc. Then we can maintain the mdp clk rate out of boot for as long as the gdsc is enabled out of boot, and we contain the "fix" to where the problem is, gdsc can't restore a clk sourcing something that's off. If we did this we still need to fix the parent mapping at clk registration, i.e. parked_cfg needs to be read from the hardware so that get_parent works.