Re: [PATCH 5/6] drivers/thermal/exynos: add initial Exynos 850 support

Sam Protsenko <semen.protsenko@xxxxxxxxxx> · Tue, 23 Jul 2024 11:47:34 -0500

On Tue, Jul 23, 2024 at 10:23 AM Sam Protsenko
<semen.protsenko@xxxxxxxxxx> wrote:
>
> On Tue, Jul 23, 2024 at 9:17 AM Mateusz Majewski
> <m.majewski2@xxxxxxxxxxx> wrote:
> >
> > > Do you know what are the possible implications of not using ACPM? As I
> > > understand, ACPM is a Samsung's downstream framework which uses APM
> > > (Active Power Management) IP block internally to act as an IPC
> > > mechanism, which makes it possible to offload any PM related
> > > operations (which might get quite heavy, if we are to belive the TRM
> > > description of APM) from CPU to APM. I'm not against the direct
> > > registers access based implementation (in fact, I'm not sure how that
> > > APM/ACPM thing can be implemented in upstreamable way and if it's
> > > worth it at all). Just curious if we understand what we are
> > > potentially missing out, and if at some point we'll be forced to
> > > implement that ACPM thing anyway (for something else)?
> >
> > Not sure honestly. The downstream v4.10 driver does many operations on
> > registers anyway...?
> >
> > > Not sure if that's true, as already discussed in my comments for the
> > > previous patches. Looks like one clock is still needed, which is the
> > > PCLK bus clock (to interface registers) which might simultaneously act
> > > as an operating (functional) clock.
> >
> > The code seems to be working correctly without this clock, both register
> > reads and writes. Maybe the support for extra sensors, which I couldn't
> > get to work, would require this clock?
> >
>
> Chances are that clock was enabled by the bootloader for us (or it's
> just enabled by default) and it just keeps running. If that's so, I'd
> say it must be described in dts and controlled by the driver. Because
> otherwise it might get disabled at any point in future, e.g. kernel
> may disable it during startup as an unused clock (when it's added to
> the clock driver), etc. Let me enable that clock for you, and then you
> can use /sys/kernel/debug/clk/ files to disable it manually and see if
> it actually affects TMU driver.
>

Yeah, that clock is definitely needed. Just submitted the series [1]
adding it, which makes the kernel stuck on startup when your series is
applied. To fix that I added next lines to the TMU node in dts:

8<--------------------------------------------------------------------------->8
     tmuctrl_0: tmu@10070000 {
         compatible = "samsung,exynos850-tmu";
         reg = <0x10070000 0x800>;
         interrupts = <GIC_SPI 226 IRQ_TYPE_LEVEL_HIGH>;
+        clock-names = "tmu_apbif";
+        clocks = <&cmu_peri CLK_GOUT_BUSIF_TMU_PCLK>;
         #thermal-sensor-cells = <0>;
    };
8<--------------------------------------------------------------------------->8

Please rework your patches to account for that required clock. Alas
the TMU dts changes can't be submitted until my series [1] is applied.
But you can still apply my series locally, and I think the driver and
bindings changes don't depend on that clock, so it should be ok to
send those

Thanks!

[1] https://lore.kernel.org/linux-samsung-soc/20240723163311.28654-2-semen.protsenko@xxxxxxxxxx/T/#mf28e4aab0111b95479ef632bc1979dff93d28cc7

> > > Exynos850 TRM says AVG_CONTROL offset is 0x38, and 0x58 is actually
> > > for THRESHOLD0_TEMP_RISE3_2 register.
> >
> > Thank you so much! Will fix in v2. Though writing to the right place
> > doesn't seem to change much in practice, probably just means that the
> > correct mode is being used.
> >
> > > Something seems off to me here. How come the shift value for EXYNOS7
> > > case is 8, but the mask is actually 9 bits long? Does it mean the
> > > first error field is 8 bits long, and the second error field is 9 bits
> > > long for EXYNOS7? I don't have the Exynos7 manual, so it's just a
> > > hunch. But if it's true, maybe this shift value has to be added in
> > > your [PATCH 2/6] to fix Exynos7 case?
> >
> > I did not really want to mess with Exynos7 code, as we don't have an
> > Exynos7 board sadly. Honestly I feel like I should drop the 2/6 patch
> > completely and only modify the code to run on 850 correctly.
> >
>
> It feels like there is an error for Exynos7 case there. Take a look at
> this commit:
>
>     aef27b658b43 ("thermal: exynos: use sanitize_temp_error() in
> exynos7_tmu_initialize()")
>
> I think that commit just forgets to update the shift value for Exynos7
> properly. This code:
>
>     data->temp_error1 = trim_info & tmu_temp_mask;
>     data->temp_error2 = ((trim_info >> EXYNOS_TRIMINFO_85_SHIFT) &
>                 EXYNOS_TMU_TEMP_MASK);
>
> in case of Exynos7 becomes:
>
>     data->temp_error1 = trim_info & 0x1ff;    // mask = 9 bits
>     data->temp_error2 = (trim_info >> 8) & 0xff;
>
> it contradicts itself, because it takes 9 rightmost bits for error1,
> and then uses 1 of those bits for error2 too. It's obvious that if 9
> bits are already used for error1, then for error2 it has to be shifted
> by 9 bits, not 8.
>
> That's why I think your patch 2/6 is legit and useful on its own, and
> it's actually a good catch on your part! But the shift value has to be
> fixed as well (for Exynos7). It's not ideal you don't have the
> hardware to test it, but it just screams *bug* to me :) Also, maybe we
> can ask someone who has Exynos7 hardware to test it for us?
>
> > > Also, just an idea: those values (and other similar values) could be
> > > pre-calculated somewhere during the probe, stored in some struct (e.g.
> > > _variant or _chip) and then just used here.
> >
> > sanitize_temp_error is only called one per probe and once per resume, so
> > probably little to gain?
> >
>
> Sure, it was just a minor suggestion to make the code look more linear
> so to speak. It can be totally skipped.
>
> > Will also do all other.