Re: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 9, 2025 at 1:13 PM John Madieu
<john.madieu.xa@xxxxxxxxxxxxxx> wrote:
>
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This patch series introduces a new thermal cooling driver that implements CPU
> hotplug-based thermal management. The driver dynamically takes CPUs offline
> during thermal excursions to reduce power consumption and prevent overheating,
> while maintaining system stability by keeping at least one CPU online.

So as far as I am concerned, this is a total no-go.  CPU offline is
not designed to be triggered from within a driver.

> 1- Problem Statement
>
> Modern SoCs require robust thermal management to prevent overheating under heavy
> workloads. Existing cooling mechanisms like frequency scaling may not always
> provide sufficient thermal relief, especially in multi-core systems where
> per-core thermal contributions can be significant.

What about idle injection?

> 2- Solution Overview
>
> The driver:
>
>  - Integrates with the Linux thermal framework as a cooling device
>  - Registers per-CPU cooling devices that respond to thermal trip points
>  - Uses CPU hotplug operations to reduce thermal load
>  - Maintains system stability by preserving the boot CPU from being put offline,
>  regardless the CPUs that are specified in cooling device list.
>  - Implements proper state tracking and cleanup
>
> Key Features:
>
>  - Dynamic CPU online/offline management based on thermal thresholds
>  - Device tree-based configuration via thermal zones and trip points

So DT-only.  Not nice.

>  - Hysteresis support through thermal governor interactions

I'd rather not combine thermal governors with CPU offline.

>  - Safe handling of CPU state transitions during module load/unload

Are you sure that it is really safe?

>  - Compatibility with existing thermal management frameworks

I'm not sure about this.

So one of the things that CPU offline does, which you probably are not
aware of, is breaking CPU affinity which is a very brutal thing for
user space if it is not expecting that to happen.  Also it migrates
interrupts between CPUs that also may confuse things.  So don't do it
from the kernel, really.

Thanks, Rafael





[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux