On 11.01.2025 01:20, Guenter Roeck wrote: > On 1/10/25 13:41, Heiner Kallweit wrote: >> On 10.01.2025 22:10, Andrew Lunn wrote: >>>> - over-temp alarm remains set, even if temperature drops below threshold >>> >>>> +int rtl822x_hwmon_init(struct phy_device *phydev) >>>> +{ >>>> + struct device *hwdev, *dev = &phydev->mdio.dev; >>>> + const char *name; >>>> + >>>> + /* Ensure over-temp alarm is reset. */ >>>> + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); >>> >>> So it is possible to clear the alarm. >>> >>> I know you wanted to experiment with this some more.... >>> >>> If the alarm is still set, does that prevent the PHY renegotiating the >>> higher link speed? If you clear the alarm, does that allow it to >>> renegotiate the higher link speed? Or is a down/up still required? >>> Does an down/up clear the alarm if the temperature is below the >>> threshold? >>> >> I tested wrt one of your previous questions, when exceeding the >> temperature threshold the chip actually removes 2.5Gbps from the >> advertisement register. >> >> If the alarm is set, the chip won't switch back automatically to >> 2.5Gbps even if the temperature drops below the alarm threshold. >> >> When clearing the alarm the chip adds 2.5Gbps back to the advertisement >> register. Worth to be mentioned: >> The temperature is checked only if the link speed is 2.5Gbps. >> Therefore the chip thinks it's safe to add back the 2.5Gbps mode >> when the alarm is cleared. >> >> What I didn't test is whether it's possible to manually add 2.5Gbps >> to the advertisement register whilst the alarm is set. >> But I assume that's the case. >> >>> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or >>> are they supported to self clear on read? >>> >> Documentation/hwmon/sysfs-interface.rst states that the alarm >> is a read-only attribute: >> >> +-------------------------------+-----------------------+ >> | **`in[0-*]_alarm`, | Channel alarm | >> | `curr[1-*]_alarm`, | | >> | `power[1-*]_alarm`, | - 0: no alarm | >> | `fan[1-*]_alarm`, | - 1: alarm | >> | `temp[1-*]_alarm`** | | >> | | RO | >> +-------------------------------+-----------------------+ >> >> Self-clearing is neither mentioned in the documentation nor >> implemented in hwmon core. > > I would argue that self clearing is implied in "RO". This isn't a hwmon > core problem, it needs to be implemented in drivers. Many chips auto-clear > alarm attributes on read. For those this is automatic. Others need > to explicitly implement clearing alarms. > Thanks a lot for the clarifications. Wrt RO and self-clearing see following snippet from a PHY datasheet. These namings are quite common IMO. I think using RC in the alarm attribute description would be clearer. Type Description LH Latch high. If the status is high, this field is set to ‘1’ and remains set. RC Read-cleared. The register field is cleared after read. RO Read only. WO Write only. RW Read and Write. SC Self-cleared. Writing a ‘1’ to this register field causes the function to be activated immediately, and then the field will be automatically cleared to ‘0’. >> >> @Guenter: >> If alarm would just mean "current value > alarm threshold", then we >> wouldn't need an extra alarm attribute, as this is something which >> can be checked in user space. > > Alarm attributes, if implemented properly and if a chip supports interrupts, > should generate sysfs and udev events to inform userspace. An alarm > doesn't just mean "current value > alarm threshold", it can also mean that > the current value was above the threshold at some point since the attribute > was read the last time. For that to work, the attribute must be sticky > until read. > > FWIW, I am sure you'll find lots of drivers not implementing this properly, > so there is no need to search for those and use them as precedent. > > If you want to support alarm attributes or not is obviously your call, > but they should be self clearing if implemented. I don't want to get complaints > along the line of "the alarm attribute is set but doesn't clear even though > the temperature (or voltage, or whatever) is below the threshold". > >> Has it ever been considered that a user may have to explicitly ack >> an alarm to clear it? Would you consider it an ABI violation if >> alarm is configured as R/W for being able to clear the alarm? >> > > Yes. > > Guenter > >>> I'm wondering if we are heading towards ABI issues? You have defined: >>> >>> - over-temp alarm remains set, even if temperature drops below threshold >>> >>> so that kind of eliminates the possibility of implementing self >>> clearing any time in the future. Explicit clearing via a write is >>> probably O.K, because the user needs to take an explicit action. Are >>> there other ABI issues i have not thought about. >>> >>> Andrew >> >> Heiner >