On Fri, Feb 16, 2018 at 8:49 PM, Philipp Rossak <embed3d@xxxxxxxxx> wrote: > > > On 16.02.2018 05:10, Chen-Yu Tsai wrote: >> >> On Fri, Feb 16, 2018 at 1:53 AM, Philipp Rossak <embed3d@xxxxxxxxx> wrote: >>> >>> >>> >>> On 15.02.2018 15:11, Maxime Ripard wrote: >>>> >>>> >>>> On Wed, Feb 14, 2018 at 02:56:12PM +0100, Philipp Rossak wrote: >>>>> >>>>> >>>>> This patch fixes a bug, that prevents the Allwinner A83T and the A80 >>>>> from a successful boot. >>>>> >>>>> The bug is there since v4.16-rc1 and appeared after the clk branch was >>>>> merged. >>>> >>>> >>>> >>>> Out of curiosity, which patch has introduced this? I couldn't find any >>>> obvious match. >>>> >>> >>> I wasn't also n >> >> >> To be honest, I'm not sure why this is hitting you and not me. >> I have both A83T boards that have assigned-clock-rates set for >> the ac100 clock outputs for WiFi. I have them running 4.16-rc1 >> and have not seen this. The device tree patches that add these >> are in 4.15. >> > > Now it is getting curious ... . > I already mentioned that bug in the sunxi-irc and someone else was hitting > that problem also... > I tested it also with the same toolchain you are using (gcc 7.3.0-1 Debian), > but that didn't made any difference. > > I don't think that issue is related with the Hardware, but to be on the save > side: Which Hardware version of the BPI-M3 do you have? I have version 1.2. > > Can someone else can confirm this bug? So I might have remembered wrong, as I just realized I have your patch in my a83t branches. I don't hit this on the A80, which also has the AC100, but doesn't use assigned-clock-rates in the device tree. Could you try rolling back to 4.15 and see if you still hit it? > > >>> >>>>> You can find the shortend trace below: >>>>> >>>>> Unable to handle kernel NULL pointer dereference at virtual address >>>>> 00000000 >>>>> pgd = (ptrval) >>>>> [00000000] *pgd=00000000 >>>>> Internal error: Oops: 5 [#1] SMP ARM >>>>> Modules linked in: >>>>> CPU: 0 PID: 49 Comm: kworker/0:1 Not tainted 4.15.0-10190-gb89e32ccd1be >>>>> #2 >>>>> Hardware name: Allwinner sun8i Family >>>>> Workqueue: events deferred_probe_work_func >>>>> PC is at clk_hw_get_rate+0x0/0x34 >>>>> LR is at ac100_clkout_determine_rate+0x48/0x19c >>>>> >>>>> [ ... ] >>>>> >>>>> (clk_hw_get_rate) from (ac100_clkout_determine_rate+0x48/0x19c) >>>>> (ac100_clkout_determine_rate) from >>>>> (clk_core_set_rate_nolock+0x3c/0x1a0) >>>>> (clk_core_set_rate_nolock) from (clk_set_rate+0x30/0x88) >>>>> (clk_set_rate) from (of_clk_set_defaults+0x200/0x364) >>>>> (of_clk_set_defaults) from (platform_drv_probe+0x18/0xb0) >>>>> >>>>> To fix that bug, we first check if the return of the >>>>> clk_hw_get_parent_by_index is non zero. If it is zero we skip that >>>>> clock parent. >>>>> >>>>> The BUG report could be found here: https://lkml.org/lkml/2018/2/10/198 >>>>> >>>>> Fixes: 04940631b8d2 ("rtc: ac100: Add clk output support") >>>>> >>>>> Signed-off-by: Philipp Rossak <embed3d@xxxxxxxxx> >>>>> --- >>>>> >>>>> Changes in v3: >>>>> * add information when the bug appeared >>>>> * make the comment more clear >>>>> Changes in v2: >>>>> * add tag Fixes: ... to commit message >>>>> * add comment to if statement why we are doing this check >>>>> >>>>> drivers/rtc/rtc-ac100.c | 19 ++++++++++++++++++- >>>>> 1 file changed, 18 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/rtc/rtc-ac100.c b/drivers/rtc/rtc-ac100.c >>>>> index 8ff9dc3fe5bf..2412aa2e8399 100644 >>>>> --- a/drivers/rtc/rtc-ac100.c >>>>> +++ b/drivers/rtc/rtc-ac100.c >>>>> @@ -183,7 +183,24 @@ static int ac100_clkout_determine_rate(struct >>>>> clk_hw >>>>> *hw, >>>>> for (i = 0; i < num_parents; i++) { >>>>> struct clk_hw *parent = clk_hw_get_parent_by_index(hw, >>>>> i); >>>>> - unsigned long tmp, prate = clk_hw_get_rate(parent); >>>>> + unsigned long tmp, prate; >>>>> + >>>>> + /* >>>>> + * The clock has two parents, one is a fixed clock >>>>> which >>>>> is >>>>> + * internally registered by the ac100 driver. The other >>>>> parent >>>>> + * is a clock from the codec side of the chip, which we >>>>> + * properly declare and reference in the devicetree and >>>>> is >>>>> + * not implemented in any driver right now. >>>>> + * If the clock core looks for the parent of that >>>>> second >>>>> + * missing clock, it can't one that is registered and >>>>> + * returns NULL. >>>>> + * Thus we need to check if the parent exists before >>>>> + * we get the parent rate. >>>>> + */ >>>>> + if (!parent) >>>>> + continue; >>>> >>>> >>>> >>>> I'm sorry, but I still don't get it. When you register that clock, you >>>> will give it two parents. Why would that change during the life of the >>>> clock? >>>> >>>> This really looks like a workaround rather than an actual fix. >>>> >>>> Maxime >>>> >>> I agree this is more a workaround! >>> A proper solution/fix would be to define the devicetree correct like >>> this: >>> >>> diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>> b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>> index 6550bf0e594b..6f56d429f17e 100644 >>> --- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>> +++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts >>> @@ -175,11 +175,18 @@ >>> compatible = "x-powers,ac100-rtc"; >>> interrupt-parent = <&r_intc>; >>> interrupts = <0 IRQ_TYPE_LEVEL_LOW>; >>> - clocks = <&ac100_codec>; >>> + clocks = <&ac100_rtc_32k>; >>> #clock-cells = <1>; >>> clock-output-names = "cko1_rtc", >>> "cko2_rtc", >>> "cko3_rtc"; >>> + >>> + ac100_rtc_32k: rtc-32k-oscillator { >>> + compatible = "fixed-clock"; >>> + #clock-cells = <0>; >>> + clock-frequency = <32768>; >>> + clock-output-names = "ac100-rtc-32k"; >>> + }; >>> }; >>> }; >>> }; >>> >>> What do you think about that solution? >> >> >> That's not quite right either. As I mentioned before, the >> RTC block has two clock inputs, one 4MHz signal from the >> codec block, and one 32.768 kHz signal from an external >> crystal. The original device tree binding describes the >> first one, and the 32.768 kHz clock was registered by >> the RTC driver internally. >> >> If you're going to add the crystal clock, you still need >> to keep the codec one. Note that this does not fix what >> Maxime is asking you. I've already provided an explanation: >> >> The clock core allows registering clocks with not-yet-existing >> clock parents. Parents are matches by string names. If no >> clock by that name is registered yet, the clock core simply >> orphans the new clock if the unregistered parent is its >> current parent or simply ignores that parent if its not the >> current parent. This is entirely valid and is what we are >> counting on here, as we haven't implemented the codec-side >> driver. >> >> Note we also sort of depend on this behavior for sunxi-ng >> clocks, where in some cases we get circular dependencies >> between clock blocks. >> >> BTW, since this is mostly related to the clk subsystem, >> please CC the clk maintainers as well. >> > > Ok, I added also now the clk mailing list and maintainers. > As reference this here is the boot log [1]. > > Im not sure if this in relation to this bug: > But, already with the 4.15 kernel I get from time to time some oops during > runtime on the a31s (bpi-m2) and all are clock related and causes a crash of > the system. Are you referring to the A31 CLK_OUT bug? That is an entirely separate bug in the A31/A31s CCU driver. The bug was there from the beginning, but did not cause a crash as the incompatible data structures had lined up in a way that the embedded data structure that was used by the incorrect clk ops was actually compatible. This changed with 4.16-rc1, as a new field was added to some of the clock types, breaking the alignment. I made some changes but have yet to write the commit log for it. Also I don't have any A31/A31s boards to test it on right now. I can send you the patch later. ChenYu > > Philipp > > > > [1]: https://pastebin.com/5c7hxjsS