Hi Anand, On Mon, Oct 21, 2019 at 4:11 PM Anand Moon <linux.amoon@xxxxxxxxx> wrote: > > Hi Martin, > > On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl > <martin.blumenstingl@xxxxxxxxxxxxxx> wrote: > > > > Hi Anand, > > > > On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@xxxxxxxxx> wrote: > > [...] > > > > Next step it to try narrow down the clock causing the issue. > > > > Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED > > > > to the flag of some clocks your clock controller (g12a I think) until > > > > > > > > The peripheral clock gates already have this flag (something we should > > > > fix someday) so don't bother looking there. > > > > > > > > Most likely the source of the pwm is getting disabled between the > > > > late_init call and the probe of the PWM module. Since the pwm is already > > > > active (w/o a driver), gating the clock source shuts dowm the power to > > > > the cores. > > > > > > > > Looking a the possible inputs in pwm driver, I'd bet on fdiv4. > > > > > > > > > > I had give this above steps a try but with little success. > > > I am still looking into this much close. > > it's not clear to me if you have only tested with the PWM and/or > > FCLK_DIV4 clocks. can you please describe what you have tested so far? > > > Sorry for delayed response. > > I had just looked into clk related to SD_EMMC_A/B/C, > with adding CLK_IGNORE/CRITICAL. > Also looked into clk_summary for eMMC and microSD card, > to identify the root cause, but I failed to move ahead. I learned to be aware of the decisions that I make when finding a bug somewhere instead of following the initial problem that I see I ask myself "is there any proof that this initial problem is the actual root cause". I can then make the decision to do some experiments to rule out a problem - until I come to a point where I ask myself again "am I still going in the right direction - how does this bring me to the root cause of the problem" unfortunately that's harder than it seems - but it keeps me from spending time going in the wrong direction > > for reference - my way of debugging this in the past was: > > 1. add some printks to clk_disable_unused_subtree (right after the > > clk_core_is_enabled check) to see which clocks are being disabled > > 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are > > being disabled based on the information from step #1 > > 3. (at some point I had a working kernel with lots of clocks with > > CLK_IGNORE_UNUSED/CLK_IS_CRITICAL) > > 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again > > until you have traced it down to the clocks that are the actual issue > > (so far I always had only one clock which caused issues, but it may be > > multiple) > > 5. investigate (and/or ask on the mailing list, Amlogic developers are > > reading the mails here as well) for the few clocks from step #4 > > > > Thanks for you valuable suggestion. I have your patch to debug this > [0] https://patchwork.kernel.org/patch/9725921/mbox/ > > So from the fist step I could identify that all the clk were getting closed > after some core cpu clk was failing. Here is the log. > > step1: [1] https://pastebin.com/p13F9HGG > > so I marked these clk as CLK_IGNORE_UNUSED and finally > I made it to boot using microSD card. nice, congrats for finding this! > After this just I converted these CLK to CLK_IS_CRITICAL > as mostly these are used the CPU clk for now. > Here is boot log successful for as of now. > > Finally: [2] https://pastebin.com/qB6pMyGQ > > I know clk maintainer are against marking flags as *CLK_IS_CRITICAL* > But this is just the step to move ahead. > > Attach is my local clk and dts patch.Just for testing. > [3] clk_critical.patch > > Plz share your thought on this. interesting, the clock driver for the 32-bit SoCs (driver/clk/meson/meson8b.c) sets CLK_IS_CRITICAL for meson8b_cpu_clk. you have something similar in your patch for the G12A/B CPU clocks I guess that also explains why changing CONFIG_PWM_MESON from =m to =y "fixes" it: - as long as the PWM driver is not loaded the VDDCPU regulator does not probe either - this goes on for the initial boot process - now the PWM driver is still not loaded and the common clock framework tries to disable the unused clocks - it disables the CPU clock and the system now stops working - (only later it would load the PWM driver and allow the cpufreq subsystem to come up) with CONFIG_PWM_MESON=y you get: - PWM driver is built-in so the VDDCPU regulator shows up - the cpufreq subsystem comes up and enables the clock (in reality it only increments the refcount because the clock is already enabled) - the common clock framework tries to disable the unused clocks - it doesn't disable the CPU clock this time because it's used (according to the ref count/enable count) - ... Martin