Re: [PATCH v2 1/2] hwmon: add ChromeOS EC driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thomas,

On 29/5/24 16:23, Thomas Weißschuh wrote:
On 2024-05-29 10:58:23+0000, Stephen Horvath wrote:
On 29/5/24 09:29, Guenter Roeck wrote:
On 5/28/24 09:15, Thomas Weißschuh wrote:
On 2024-05-28 08:50:49+0000, Guenter Roeck wrote:
On 5/27/24 17:15, Stephen Horvath wrote:
On 28/5/24 05:24, Thomas Weißschuh wrote:
On 2024-05-25 09:13:09+0000, Stephen Horvath wrote:
Don't forget it can also return `EC_FAN_SPEED_STALLED`.

<snip>


Thanks for the hint. I'll need to think about how to
handle this better.

Like Guenter, I also don't like returning `-ENODEV`,
but I don't have a
problem with checking for `EC_FAN_SPEED_NOT_PRESENT`
in case it was removed
since init or something.


That won't happen. Chromebooks are not servers, where one might
be able to
replace a fan tray while the system is running.

In one of my testruns this actually happened.
When running on battery, one specific of the CPU sensors sporadically
returned EC_FAN_SPEED_NOT_PRESENT.


What Chromebook was that ? I can't see the code path in the EC source
that would get me there.


I believe Thomas and I both have the Framework 13 AMD, the source code is
here:
https://github.com/FrameworkComputer/EmbeddedController/tree/lotus-zephyr

Correct.

The organisation confuses me a little, but Dustin has previous said on the
framework forums (https://community.frame.work/t/what-ec-is-used/38574/2):

"This one is based on the Zephyr port of the ChromeOS EC, and tracks
mainline more closely. It is in the branch lotus-zephyr.
All of the model-specific code lives in zephyr/program/lotus.
The 13"-specific code lives in a few subdirectories off the main tree named
azalea."

The EC code is at [0]:

$ ectool version
RO version:    azalea_v3.4.113353-ec:b4c1fb,os
RW version:    azalea_v3.4.113353-ec:b4c1fb,os
Firmware copy: RO
Build info:    azalea_v3.4.113353-ec:b4c1fb,os:7b88e1,cmsis:4aa3ff 2024-03-26 07:10:22 lotus@ip-172-26-3-226
Tool version:  0.0.1-isolate May  6 2024 none

I can confirm mine is the same build too.

 From the build info I gather it should be commit b4c1fb, which is the
current HEAD of the lotus-zephyr branch.
Lotus is the Framework 16 AMD, which is very similar to Azalea, the
Framework 13 AMD, which I tested this against.
Both share the same codebase.

Also I just unplugged my fan and you are definitely correct, the EC only
generates EC_FAN_SPEED_NOT_PRESENT for fans it does not have the capability
to support. Even after a reboot it just returns 0 RPM for an unplugged fan.
I thought about simulating a stall too, but I was mildly scared I was going
to break one of the tiny blades.

I get the error when unplugging *the charger*.

To be more precise:

It does not happen always.
It does not happen instantly on unplugging.
It goes away after a few seconds/minutes.
During the issue, one specific sensor reads 0xffff.


Oh I see, I haven't played around with the temp sensors until now, but I can confirm the last temp sensor (cpu@4c / temp4) will randomly (every ~2-15 seconds) return EC_TEMP_SENSOR_ERROR (0xfe).
Unplugging the charger doesn't seem to have any impact for me.
The related ACPI sensor also says 180.8°C.
I'll probably create an issue or something shortly.

I was mildly confused by 'CPU sensors' and 'EC_FAN_SPEED_NOT_PRESENT' in the same sentence, but I'm now assuming you mean the temp sensor?

Ok.

My approach was to return the speed as `0`, since
the fan probably isn't
spinning, but set HWMON_F_FAULT for `EC_FAN_SPEED_NOT_PRESENT` and
HWMON_F_ALARM for `EC_FAN_SPEED_STALLED`.
No idea if this is correct though.

I'm not a fan of returning a speed of 0 in case of errors.
Rather -EIO which can't be mistaken.
Maybe -EIO for both EC_FAN_SPEED_NOT_PRESENT (which
should never happen)
and also for EC_FAN_SPEED_STALLED.

Yeah, that's pretty reasonable.


-EIO is an i/o error. I have trouble reconciling that with
EC_FAN_SPEED_NOT_PRESENT or EC_FAN_SPEED_STALLED.

Looking into the EC source code [1], I see:

EC_FAN_SPEED_NOT_PRESENT means that the fan is not present.
That should return -ENODEV in the above code, but only for
the purpose of making the attribute invisible.

EC_FAN_SPEED_STALLED means exactly that, i.e., that the fan
is present but not turning. The EC code does not expect that
to happen and generates a thermal event in case it does.
Given that, it does make sense to set the fault flag.
The actual fan speed value should then be reported as 0 or
possibly -ENODATA. It should _not_ generate any other error
because that would trip up the "sensors" command for no
good reason.

Ack.

Currently I have the following logic (for both fans and temp):

if NOT_PRESENT during probing:
    make the attribute invisible.

if any error during runtime (including NOT_PRESENT):
    return -ENODATA and a FAULT

This should also handle the sporadic NOT_PRESENT failures.

What do you think?

Is there any other feedback to this revision or should I send the next?


No, except I'd really like to know which Chromebook randomly generates
a EC_FAN_SPEED_NOT_PRESENT response because that really looks like a bug.
Also, can you reproduce the problem with the ectool command ?

Yes, the ectool command reports the same issue at the same time.

The fan affected was always the sensor cpu@4c, which is
compatible = "amd,sb-tsi".

I have a feeling it was related to the concurrency problems between ACPI and
the CrOS code that are being fixed in another patch by Ben Walsh, I was also
seeing some weird behaviour sometimes but I *believe* it was fixed by that.

I don't think it's this issue.
Ben's series at [1], is for MEC ECs which are the older Intel
Frameworks, not the Framework 13 AMD.

Yeah sorry, I saw it mentioned AMD and threw it into my kernel, I also thought it stopped the 'packet too long' messages (for EC_CMD_CONSOLE_SNAPSHOT) but it did not.

Thanks,
Steve




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux