Re: [PATCH 5/6] drivers/thermal/exynos: add initial Exynos 850 support

Sam Protsenko <semen.protsenko@xxxxxxxxxx> · Tue, 23 Jul 2024 10:23:36 -0500

On Tue, Jul 23, 2024 at 9:17 AM Mateusz Majewski
<m.majewski2@xxxxxxxxxxx> wrote:
>
> > Do you know what are the possible implications of not using ACPM? As I
> > understand, ACPM is a Samsung's downstream framework which uses APM
> > (Active Power Management) IP block internally to act as an IPC
> > mechanism, which makes it possible to offload any PM related
> > operations (which might get quite heavy, if we are to belive the TRM
> > description of APM) from CPU to APM. I'm not against the direct
> > registers access based implementation (in fact, I'm not sure how that
> > APM/ACPM thing can be implemented in upstreamable way and if it's
> > worth it at all). Just curious if we understand what we are
> > potentially missing out, and if at some point we'll be forced to
> > implement that ACPM thing anyway (for something else)?
>
> Not sure honestly. The downstream v4.10 driver does many operations on
> registers anyway...?
>
> > Not sure if that's true, as already discussed in my comments for the
> > previous patches. Looks like one clock is still needed, which is the
> > PCLK bus clock (to interface registers) which might simultaneously act
> > as an operating (functional) clock.
>
> The code seems to be working correctly without this clock, both register
> reads and writes. Maybe the support for extra sensors, which I couldn't
> get to work, would require this clock?
>

Chances are that clock was enabled by the bootloader for us (or it's
just enabled by default) and it just keeps running. If that's so, I'd
say it must be described in dts and controlled by the driver. Because
otherwise it might get disabled at any point in future, e.g. kernel
may disable it during startup as an unused clock (when it's added to
the clock driver), etc. Let me enable that clock for you, and then you
can use /sys/kernel/debug/clk/ files to disable it manually and see if
it actually affects TMU driver.

> > Exynos850 TRM says AVG_CONTROL offset is 0x38, and 0x58 is actually
> > for THRESHOLD0_TEMP_RISE3_2 register.
>
> Thank you so much! Will fix in v2. Though writing to the right place
> doesn't seem to change much in practice, probably just means that the
> correct mode is being used.
>
> > Something seems off to me here. How come the shift value for EXYNOS7
> > case is 8, but the mask is actually 9 bits long? Does it mean the
> > first error field is 8 bits long, and the second error field is 9 bits
> > long for EXYNOS7? I don't have the Exynos7 manual, so it's just a
> > hunch. But if it's true, maybe this shift value has to be added in
> > your [PATCH 2/6] to fix Exynos7 case?
>
> I did not really want to mess with Exynos7 code, as we don't have an
> Exynos7 board sadly. Honestly I feel like I should drop the 2/6 patch
> completely and only modify the code to run on 850 correctly.
>

It feels like there is an error for Exynos7 case there. Take a look at
this commit:

    aef27b658b43 ("thermal: exynos: use sanitize_temp_error() in
exynos7_tmu_initialize()")

I think that commit just forgets to update the shift value for Exynos7
properly. This code:

    data->temp_error1 = trim_info & tmu_temp_mask;
    data->temp_error2 = ((trim_info >> EXYNOS_TRIMINFO_85_SHIFT) &
                EXYNOS_TMU_TEMP_MASK);

in case of Exynos7 becomes:

    data->temp_error1 = trim_info & 0x1ff;    // mask = 9 bits
    data->temp_error2 = (trim_info >> 8) & 0xff;

it contradicts itself, because it takes 9 rightmost bits for error1,
and then uses 1 of those bits for error2 too. It's obvious that if 9
bits are already used for error1, then for error2 it has to be shifted
by 9 bits, not 8.

That's why I think your patch 2/6 is legit and useful on its own, and
it's actually a good catch on your part! But the shift value has to be
fixed as well (for Exynos7). It's not ideal you don't have the
hardware to test it, but it just screams *bug* to me :) Also, maybe we
can ask someone who has Exynos7 hardware to test it for us?

> > Also, just an idea: those values (and other similar values) could be
> > pre-calculated somewhere during the probe, stored in some struct (e.g.
> > _variant or _chip) and then just used here.
>
> sanitize_temp_error is only called one per probe and once per resume, so
> probably little to gain?
>

Sure, it was just a minor suggestion to make the code look more linear
so to speak. It can be totally skipped.

> Will also do all other.