> According to Guenters feedback the alarm attribute must not be written > and is expected to be self-clearing on read. > If we would clear the alarm in the chip on alarm attribute read, then > we can have the following ugly scenario: > > 1. Temperature threshold is exceeded and chip reduces speed to 1Gbps > 2. Temperature is falling below alarm threshold > 3. User uses "sensors" to check the current temperature > 4. The implicit alarm attribute read causes the chip to clear the > alarm and re-enable 2.5Gbps speed, resulting in the temperature > alarm threshold being exceeded very soon again. > > What isn't nice here is that it's not transparent to the user that > a read-only command from his perspective causes the protective measure > of the chip to be cancelled. > > There's no existing hwmon attribute meant to be used by the user > to clear a hw alarm once he took measures to protect the chip > from overheating. It is generally not the kernels job to implement policy. User space should be doing that. I see two different possible policies, and there are maybe others: 1) The user is happy with one second outages every so often as the chip cycles between too hot and down shifting, and cool enough to upshift back to the higher speeds. 2) The user prefers to have reliable, slower connectivity and needs to explicitly do something like down/up the interface to get it back to the higher speed. I personally would say, from a user support view, 2) is better. A one time 1 second break in connectivity and a kernel message is going to cause less issues. Maybe the solution is that the hwmon alarm attribute is not directly the hardware bit, but a software interpretation of the system state. When the alarm fires, copy it into a software alarm state, but leave the hardware alarm alone. A hwmon read clears the software state, but leaves the hardware alone. A down/up of the interface will then clear both the software and hardware alarm state. Anybody wanting policy 1) would then need a daemon polling the state and taking action. 2) would be the default. How easy is it for you to get into the alarm state? Did you need an environment chamber/oven, or is it happening for you with just lots of continuous traffic at typical room temperature? Are we talking about cheap USB dangles in a sealed plastic case with poor thermal design are going to be doing this all the time? Andrew