On 1/10/25 13:41, Heiner Kallweit wrote:
On 10.01.2025 22:10, Andrew Lunn wrote:
- over-temp alarm remains set, even if temperature drops below threshold
+int rtl822x_hwmon_init(struct phy_device *phydev)
+{
+ struct device *hwdev, *dev = &phydev->mdio.dev;
+ const char *name;
+
+ /* Ensure over-temp alarm is reset. */
+ phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
So it is possible to clear the alarm.
I know you wanted to experiment with this some more....
If the alarm is still set, does that prevent the PHY renegotiating the
higher link speed? If you clear the alarm, does that allow it to
renegotiate the higher link speed? Or is a down/up still required?
Does an down/up clear the alarm if the temperature is below the
threshold?
I tested wrt one of your previous questions, when exceeding the
temperature threshold the chip actually removes 2.5Gbps from the
advertisement register.
If the alarm is set, the chip won't switch back automatically to
2.5Gbps even if the temperature drops below the alarm threshold.
When clearing the alarm the chip adds 2.5Gbps back to the advertisement
register. Worth to be mentioned:
The temperature is checked only if the link speed is 2.5Gbps.
Therefore the chip thinks it's safe to add back the 2.5Gbps mode
when the alarm is cleared.
What I didn't test is whether it's possible to manually add 2.5Gbps
to the advertisement register whilst the alarm is set.
But I assume that's the case.
Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
are they supported to self clear on read?
Documentation/hwmon/sysfs-interface.rst states that the alarm
is a read-only attribute:
+-------------------------------+-----------------------+
| **`in[0-*]_alarm`, | Channel alarm |
| `curr[1-*]_alarm`, | |
| `power[1-*]_alarm`, | - 0: no alarm |
| `fan[1-*]_alarm`, | - 1: alarm |
| `temp[1-*]_alarm`** | |
| | RO |
+-------------------------------+-----------------------+
Self-clearing is neither mentioned in the documentation nor
implemented in hwmon core.
I would argue that self clearing is implied in "RO". This isn't a hwmon
core problem, it needs to be implemented in drivers. Many chips auto-clear
alarm attributes on read. For those this is automatic. Others need
to explicitly implement clearing alarms.
@Guenter:
If alarm would just mean "current value > alarm threshold", then we
wouldn't need an extra alarm attribute, as this is something which
can be checked in user space.
Alarm attributes, if implemented properly and if a chip supports interrupts,
should generate sysfs and udev events to inform userspace. An alarm
doesn't just mean "current value > alarm threshold", it can also mean that
the current value was above the threshold at some point since the attribute
was read the last time. For that to work, the attribute must be sticky
until read.
FWIW, I am sure you'll find lots of drivers not implementing this properly,
so there is no need to search for those and use them as precedent.
If you want to support alarm attributes or not is obviously your call,
but they should be self clearing if implemented. I don't want to get complaints
along the line of "the alarm attribute is set but doesn't clear even though
the temperature (or voltage, or whatever) is below the threshold".
Has it ever been considered that a user may have to explicitly ack
an alarm to clear it? Would you consider it an ABI violation if
alarm is configured as R/W for being able to clear the alarm?
Yes.
Guenter
I'm wondering if we are heading towards ABI issues? You have defined:
- over-temp alarm remains set, even if temperature drops below threshold
so that kind of eliminates the possibility of implementing self
clearing any time in the future. Explicit clearing via a write is
probably O.K, because the user needs to take an explicit action. Are
there other ABI issues i have not thought about.
Andrew
Heiner