Re: [RFCv1 5/5] arm64/ARM: configs: Change CONFIG_PWM_MESON from m to y

Martin Blumenstingl <martin.blumenstingl@xxxxxxxxxxxxxx> · Thu, 24 Oct 2019 22:20:16 +0200

Hi Anand,

On Mon, Oct 21, 2019 at 4:11 PM Anand Moon <linux.amoon@xxxxxxxxx> wrote:
>
> Hi Martin,
>
> On Fri, 18 Oct 2019 at 23:40, Martin Blumenstingl
> <martin.blumenstingl@xxxxxxxxxxxxxx> wrote:
> >
> > Hi Anand,
> >
> > On Fri, Oct 18, 2019 at 4:04 PM Anand Moon <linux.amoon@xxxxxxxxx> wrote:
> > [...]
> > > > Next step it to try narrow down the clock causing the issue.
> > > > Remove clk_ignore_unused from the command line and add CLK_INGORE_UNUSED
> > > > to the flag of some clocks your clock controller (g12a I think) until
> > > >
> > > > The peripheral clock gates already have this flag (something we should
> > > > fix someday) so don't bother looking there.
> > > >
> > > > Most likely the source of the pwm is getting disabled between the
> > > > late_init call and the probe of the PWM module. Since the pwm is already
> > > > active (w/o a driver), gating the clock source shuts dowm the power to
> > > > the cores.
> > > >
> > > > Looking a the possible inputs in pwm driver, I'd bet on fdiv4.
> > > >
> > >
> > > I had give this above steps a try but with little success.
> > > I am still looking into this much close.
> > it's not clear to me if you have only tested with the PWM and/or
> > FCLK_DIV4 clocks. can you please describe what you have tested so far?
> >
> Sorry for delayed response.
>
> I had just looked into clk related to SD_EMMC_A/B/C,
> with adding CLK_IGNORE/CRITICAL.
> Also looked into clk_summary for eMMC and microSD card,
> to identify the root cause, but I failed to move ahead.
I learned to be aware of the decisions that I make when finding a bug somewhere
instead of following the initial problem that I see I ask myself "is
there any proof that this initial problem is the actual root cause".
I can then make the decision to do some experiments to rule out a
problem - until I come to a point where I ask myself again "am I still
going in the right direction - how does this bring me to the root
cause of the problem"
unfortunately that's harder than it seems - but it keeps me from
spending time going in the wrong direction

> > for reference - my way of debugging this in the past was:
> > 1. add some printks to clk_disable_unused_subtree (right after the
> > clk_core_is_enabled check) to see which clocks are being disabled
> > 2. add CLK_IGNORE_UNUSED or CLK_IS_CRITICAL to the clocks which are
> > being disabled based on the information from step #1
> > 3. (at some point I had a working kernel with lots of clocks with
> > CLK_IGNORE_UNUSED/CLK_IS_CRITICAL)
> > 4. start dropping the CLK_IGNORE_UNUSED/CLK_IS_CRITICAL flags again
> > until you have traced it down to the clocks that are the actual issue
> > (so far I always had only one clock which caused issues, but it may be
> > multiple)
> > 5. investigate (and/or ask on the mailing list, Amlogic developers are
> > reading the mails here as well) for the few clocks from step #4
> >
>
> Thanks for you valuable suggestion. I have your patch to debug this
> [0]  https://patchwork.kernel.org/patch/9725921/mbox/
>
> So from the fist step I could identify that all the clk were getting closed
> after some core cpu clk was failing. Here is the log.
>
> step1: [1] https://pastebin.com/p13F9HGG
>
> so I marked these clk as CLK_IGNORE_UNUSED and finally
> I made it to boot using microSD card.
nice, congrats for finding this!

> After this just I converted these CLK to CLK_IS_CRITICAL
> as mostly these are used the CPU clk for now.
> Here is boot log successful for as of now.
>
> Finally: [2]  https://pastebin.com/qB6pMyGQ
>
> I know clk maintainer are against marking flags as *CLK_IS_CRITICAL*
> But this is just the step to move ahead.
>
> Attach is my local clk and dts patch.Just for testing.
> [3] clk_critical.patch
>
> Plz share your thought on this.
interesting, the clock driver for the 32-bit SoCs
(driver/clk/meson/meson8b.c) sets CLK_IS_CRITICAL for meson8b_cpu_clk.
you have something similar in your patch for the G12A/B CPU clocks
I guess that also explains why changing CONFIG_PWM_MESON from =m to =y
"fixes" it:
- as long as the PWM driver is not loaded the VDDCPU regulator does
not probe either
- this goes on for the initial boot process
- now the PWM driver is still not loaded and the common clock
framework tries to disable the unused clocks
- it disables the CPU clock and the system now stops working
- (only later it would load the PWM driver and allow the cpufreq
subsystem to come up)

with CONFIG_PWM_MESON=y you get:
- PWM driver is built-in so the VDDCPU regulator shows up
- the cpufreq subsystem comes up and enables the clock (in reality it
only increments the refcount because the clock is already enabled)
- the common clock framework tries to disable the unused clocks - it
doesn't disable the CPU clock this time because it's used (according
to the ref count/enable count)
- ...

Martin