Re: [PATCH v5 6/6] arm64: dts: qcom: Enable cpu cooling devices for QCS9075 platforms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 15, 2025 at 12:46:50AM +0530, Manaf Meethalavalappu Pallikunhi wrote:
> 
> Hi Dmitry,
> 
> Sorry, I was busy with some other priority tasks.
> 
> On 1/10/2025 5:24 AM, Dmitry Baryshkov wrote:
> > On Wed, Jan 08, 2025 at 09:38:19PM +0530, Manaf Meethalavalappu Pallikunhi wrote:
> > > Hi Dmitry,
> > > 
> > > 
> > > On 1/8/2025 6:16 PM, Dmitry Baryshkov wrote:
> > > > On Wed, Jan 08, 2025 at 05:57:06PM +0530, Manaf Meethalavalappu Pallikunhi wrote:
> > > > > Hi Dmitry,
> > > > > 
> > > > > 
> > > > > On 1/3/2025 11:21 AM, Dmitry Baryshkov wrote:
> > > > > > On Tue, Dec 31, 2024 at 05:31:41PM +0530, Manaf Meethalavalappu Pallikunhi wrote:
> > > > > > > Hi Dmitry,
> > > > > > > 
> > > > > > > On 12/30/2024 9:10 PM, Dmitry Baryshkov wrote:
> > > > > > > > On Sun, Dec 29, 2024 at 08:53:32PM +0530, Wasim Nazir wrote:
> > > > > > > > > From: Manaf Meethalavalappu Pallikunhi <quic_manafm@xxxxxxxxxxx>
> > > > > > > > > 
> > > > > > > > > In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
> > > > > > > > > does corrective action for each subsystem based on sensor violation
> > > > > > > > > to comply safety standards. But as QCS9075 is non-safe SoC it
> > > > > > > > > requires conventional thermal mitigation to control thermal for
> > > > > > > > > different subsystems.
> > > > > > > > > 
> > > > > > > > > The cpu frequency throttling for different cpu tsens is enabled in
> > > > > > > > > hardware as first defense for cpu thermal control. But QCS9075 SoC
> > > > > > > > > has higher ambient specification. During high ambient condition, even
> > > > > > > > > lowest frequency with multi cores can slowly build heat over the time
> > > > > > > > > and it can lead to thermal run-away situations. This patch restrict
> > > > > > > > > cpu cores during this scenario helps further thermal control and
> > > > > > > > > avoids thermal critical violation.
> > > > > > > > > 
> > > > > > > > > Add cpu idle injection cooling bindings for cpu tsens thermal zones
> > > > > > > > > as a mitigation for cpu subsystem prior to thermal shutdown.
> > > > > > > > > 
> > > > > > > > > Add cpu frequency cooling devices that will be used by userspace
> > > > > > > > > thermal governor to mitigate skin thermal management.
> > > > > > > > Does anything prevent us from having this config as a part of the basic
> > > > > > > > sa8775p.dtsi setup? If HW is present in the base version but it is not
> > > > > > > > accessible for whatever reason, please move it the base device config
> > > > > > > > and use status "disabled" or "reserved" to the respective board files.
> > > > > > > Sure,  I will move idle injection node for each cpu to sa8775p.dtsi and keep
> > > > > > > it disabled state. #cooling cells property for CPU, still wanted to keep it
> > > > > > > in board files as we don't want to enable any cooling device in base DT.
> > > > > > "we don't want" is not a proper justification. So, no.
> > > > > As noted in the commit, thermal cooling mitigation is only necessary for
> > > > > non-safe SoCs. Adding this cooling cell property to the CPU node in the base
> > > > > DT (sa8775p.dtsi), which is shared by both safe and non-safe SoCs, would
> > > > > violate the requirements for safe SoCs. Therefore, we will include it only
> > > > > in non-safe SoC boards.
> > > > "is only necessary" is fine. It means that it is an optional part which
> > > > is going to be unused / ignored / duplicate functionality on the "safe"
> > > > SoCs. What kind of requirement is going to be violated in this way?
> > >  From the perspective of a safe SoC, any software mitigation that compromises
> > > the safety subsystem’s compliance should not be allowed. Enabling the
> > > cooling device also opens up the sysfs interface for userspace, which we may
> > > not fully control.
> > THere are a lot of interfaces exported to the userspace.
> > 
> > > Userspace apps or partner apps might inadvertently use
> > > it. Therefore, we believe it is better not to expose such an interface, as
> > > it is not required for that SoC and helps to avoid opening up an interface
> > > that could potentially lead to a safety failure.
> > How can thermal mitigation interface lead to safety failure? Userspace
> > can possibly lower trip points, but it can not override existing
> > firmware-based mitigation.
> Both firmware and software passive mitigations (CPU/GPU) are not permitted
> for Safe SoC.

Not permitted by whom?

> As mentioned in the commit, it is the responsibility of the
> safety subsystem to take corrective action for high temperatures. One of the
> safety requirements (not a functional requirement) for Safe SoC is to avoid
> scaling of CPUs and bus DCVS, as this could impact safety-critical
> workloads. Therefore, Safe SoC will operate at maximum frequency with the
> performance governor. Enabling a cooling device for the CPU would expose the
> cooling device sysfs interface, allowing userspace to request different
> cooling states via the cooling device cur_state sysfs, which could
> potentially lower the frequency and violate the safety requirement.

So, you disable thermal mitigation controls, but your description allows
userspace to change the CPUfreq governor through sysfs. Isn't that also
unsafe?

> > And if there is a known problem with the interface, it should be fixed
> > instead.
> 
> There is no interface issue, as it is not required and can be disabled for
> Safe SoC. This approach has already been used for commercializing the
> SA8775P, and we do not want to disrupt it now. Therefore, we believe it is
> better not to add cpu cooling cell property (CPU cooling device) in sa8775p
> base dtsi.

Okay, assuming  SA8775P may not have any thermal-related properties,
what is the issue with having them for the IoT-related QCS9100 device?

So, you have an alternative proposal: rename sa8775p.dtsi to
qcs9100.dtsi, defining a full set of thermal properties there and add
sa8775p.dtsi which includes qcs9100.dtsi and just removes thermal and
cpufreq nodes. Doing it in any other way, especially by including a
random SoC-related include defeats the logic of the DTSI files.

> 
> Best Regards,
> 
> Manaf
> 
> > 
> > > Best Regards,
> > > 
> > > Manaf
> > > 

-- 
With best wishes
Dmitry




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux