On Fri, Jan 10, 2025 at 01:31:00PM +0100, Konrad Dybcio wrote: > On 10.01.2025 12:54 AM, Dmitry Baryshkov wrote: > > On Wed, Jan 08, 2025 at 09:38:19PM +0530, Manaf Meethalavalappu Pallikunhi wrote: > >> > >> Hi Dmitry, > >> > >> > >> On 1/8/2025 6:16 PM, Dmitry Baryshkov wrote: > >>> On Wed, Jan 08, 2025 at 05:57:06PM +0530, Manaf Meethalavalappu Pallikunhi wrote: > >>>> Hi Dmitry, > >>>> > >>>> > >>>> On 1/3/2025 11:21 AM, Dmitry Baryshkov wrote: > >>>>> On Tue, Dec 31, 2024 at 05:31:41PM +0530, Manaf Meethalavalappu Pallikunhi wrote: > >>>>>> Hi Dmitry, > >>>>>> > >>>>>> On 12/30/2024 9:10 PM, Dmitry Baryshkov wrote: > >>>>>>> On Sun, Dec 29, 2024 at 08:53:32PM +0530, Wasim Nazir wrote: > >>>>>>>> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@xxxxxxxxxxx> > >>>>>>>> > >>>>>>>> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and > >>>>>>>> does corrective action for each subsystem based on sensor violation > >>>>>>>> to comply safety standards. But as QCS9075 is non-safe SoC it > >>>>>>>> requires conventional thermal mitigation to control thermal for > >>>>>>>> different subsystems. > >>>>>>>> > >>>>>>>> The cpu frequency throttling for different cpu tsens is enabled in > >>>>>>>> hardware as first defense for cpu thermal control. But QCS9075 SoC > >>>>>>>> has higher ambient specification. During high ambient condition, even > >>>>>>>> lowest frequency with multi cores can slowly build heat over the time > >>>>>>>> and it can lead to thermal run-away situations. This patch restrict > >>>>>>>> cpu cores during this scenario helps further thermal control and > >>>>>>>> avoids thermal critical violation. > >>>>>>>> > >>>>>>>> Add cpu idle injection cooling bindings for cpu tsens thermal zones > >>>>>>>> as a mitigation for cpu subsystem prior to thermal shutdown. > >>>>>>>> > >>>>>>>> Add cpu frequency cooling devices that will be used by userspace > >>>>>>>> thermal governor to mitigate skin thermal management. > >>>>>>> Does anything prevent us from having this config as a part of the basic > >>>>>>> sa8775p.dtsi setup? If HW is present in the base version but it is not > >>>>>>> accessible for whatever reason, please move it the base device config > >>>>>>> and use status "disabled" or "reserved" to the respective board files. > >>>>>> Sure, I will move idle injection node for each cpu to sa8775p.dtsi and keep > >>>>>> it disabled state. #cooling cells property for CPU, still wanted to keep it > >>>>>> in board files as we don't want to enable any cooling device in base DT. > >>>>> "we don't want" is not a proper justification. So, no. > >>>> As noted in the commit, thermal cooling mitigation is only necessary for > >>>> non-safe SoCs. Adding this cooling cell property to the CPU node in the base > >>>> DT (sa8775p.dtsi), which is shared by both safe and non-safe SoCs, would > >>>> violate the requirements for safe SoCs. Therefore, we will include it only > >>>> in non-safe SoC boards. > >>> "is only necessary" is fine. It means that it is an optional part which > >>> is going to be unused / ignored / duplicate functionality on the "safe" > >>> SoCs. What kind of requirement is going to be violated in this way? > >> > >> From the perspective of a safe SoC, any software mitigation that compromises > >> the safety subsystem’s compliance should not be allowed. Enabling the > >> cooling device also opens up the sysfs interface for userspace, which we may > >> not fully control. > > > > THere are a lot of interfaces exported to the userspace. > > > >> Userspace apps or partner apps might inadvertently use > >> it. Therefore, we believe it is better not to expose such an interface, as > >> it is not required for that SoC and helps to avoid opening up an interface > >> that could potentially lead to a safety failure. > > > > How can thermal mitigation interface lead to safety failure? Userspace > > can possibly lower trip points, but it can not override existing > > firmware-based mitigation. > > And if there is a known problem with the interface, it should be fixed > > instead. > > I think the intended case to avoid is where a malicious actor would set > the trips too low, resulting in throttling down the CPU to FMIN / Linux > throttling CPUs to try and escape what it believes to be possible thermal > runaway / a system reboot. Not something desired in a car. Being able to set trip points via sysfs means that the system is already compromised. At this point it can do whatever the actor wants - e.g. display malicious HUD or just a gren bar or black screen, scream into dynamic, etc. That doesn't sound like the temperature trip points being the only or the major problem of a car. Anyway, if that's really the only problem, please use the framework to make the temperature and hysteresis of the trip point R/O for sa8775p / qcs9100. Other attributes might need to be made R/O too. It well might be that I'm missing one of the automotive peculiarties here. In such a case the commit message should be more explicit that it's AGL or some other requirement and provide a link. -- With best wishes Dmitry