Hi Armin, Werner, On 1-Dec-24 6:58 PM, Armin Wolf wrote: > Am 27.11.24 um 12:21 schrieb Werner Sembach: > >> Hi, >> >> Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@xxxxxxxxxxxxxxxxxxx/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@xxxxxxxxxxxx/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall: >> >> As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor. >> >> So my ideas/questions: >> - Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor >> - Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied >> - How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor. >> - Can I somehow restrict userspace from using certain govenors? >> - I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then? > > Hi, > > i am having little experience with the thermal subsystem, but i suggest that policy decisions like "min_fanspeed_percent" should either: > > - come from the hardware/firmware itself > - come from userspace > > Effectively this driver tries to enforce a Tuxedo-specific policy that is not directly based on hardware limits. The book "Linux Device Drivers" > says: > > "the role of a device driver is providing///mechanism/, not/policy/." > > Furthermore: > > "When/writing/ drivers, a programmer should pay particular attention to this fundamental concept: write kernel code to access the hardware, > but don't force particular policies on the user, since different users have different needs. The driver should deal with making the hardware > available, leaving all the issues about/how/ to use the hardware to the applications. A driver, then, is flexible if it offers access to the > hardware capabilities without adding constraints." > > The issue is that the Tuxedo-specific policy is not directly connected with the hardware. Some hardware might need a bigger minimum fan speed than > other hardware for example. > > The hardware (or rather firmware in this case) should communicate those constraints to the linux driver so that we do not need to rely on random > temperatures for hardware protection. > > This ACPI interface however basically provides us with a hwmon interface and Tuxedo now wants the kernel to enforce their policy on it. I suspect that > happens for warranty reasons, right? > > Maybe there is a way to enforce this policy through userspace applications? An important role of device-drivers is also to avoid driving hardware outside its valid specifications. E.g. charger drivers in the power_supply subsystem have constant_charge_current and constant_charge_voltage settings which give a max charging speed in Amperes and a max charging Voltage (typically 4.2V for single Li-xxx cell batteries). Setting these too high can be very dangerous leading to batteries catching fire / exploding. so naturally the drivers also have and enforce constant_charge_current_max and constant_charge_voltage_max sysfs attributes and the values there do not reflect the maximum what the charger-chip can handle, but the maximum which the manufacturer has specified for the battery. This info can e.g. come from devicetree but if not present then the values applied by the firmware at boot should be read back and used not only as active, but also as max values. I believe what Werner wants is pretty much the same thing, allow userspace to provide its own fan management, but have the kernel monitor the temperature and if the temperature goes say above 80° Celsius, enforce a min fan-speed of 30% and at 90° enforce a fan-speed of say 70%. Yes the exact values are a for of policy, but this is a policy to protect hw from getting damaged and as such is fine to have in the kernel. As for how to implement this with the thermal subsystem (which indeed looks like it might be the way to go) I'm not familiar enough with the thermal subsystem either. Werner, I suggest that you send an email to the thermal subsystem maintainers and ask them about this. It would be good to start with explaining your problem before asking the questions. I would leave out details about the ACPI interface. Just say you have a temp-sensor + pwm to control fan speed combination where you want to allow userspace control while having the kernel monitor things and have the kernel overriding userspace's fan speed setting when things get too hot, to deal with e.g. the userspace process crashing at a low fan speed setting. Regards, Hans