Hi, On Mon, Jun 3, 2024 at 3:58 PM Julien Panis <jpanis@xxxxxxxxxxxx> wrote: > > On 5/29/24 14:06, AngeloGioacchino Del Regno wrote: > > Il 29/05/24 11:12, Julien Panis ha scritto: > >> On 5/29/24 10:33, Chen-Yu Tsai wrote: > >>> On Wed, May 29, 2024 at 4:17 PM AngeloGioacchino Del Regno > >>> <angelogioacchino.delregno@xxxxxxxxxxxxx> wrote: > >>>> Il 29/05/24 07:57, Julien Panis ha scritto: > >>>>> From: Nicolas Pitre <npitre@xxxxxxxxxxxx> > >>>>> > >>>>> Inspired by the vendor kernel but adapted to the upstream thermal > >>>>> driver version. > >>>>> > >>>>> Signed-off-by: Nicolas Pitre <npitre@xxxxxxxxxxxx> > >>>>> Signed-off-by: Julien Panis <jpanis@xxxxxxxxxxxx> > >>>> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> > >>> I'm getting some crazy readings which would cause the machine to > >>> immediately shutdown during boot. Anyone else see this? Or maybe > >>> my device has bad calibration data? > >>> > >>> gpu_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +229.7 C > >>> > >>> nna_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +229.7 C > >>> > >>> cpu_big0_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: -7.2 C > >>> > >>> cpu_little2_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +157.2 C > >>> > >>> cpu_little0_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: -277.1 C > >>> > >>> adsp_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +229.7 C > >>> > >>> cpu_big1_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +229.7 C > >>> > >>> cam_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +45.4 C > >>> > >>> cpu_little1_thermal-virtual-0 > >>> Adapter: Virtual device > >>> temp1: -241.8 C > >> > >> It's likely that your device has bad calibration data indeed. We observed the same > >> behavior on the mt8186 device we used (a Corsola) and finally realized that the > >> golden temperature was 0 (device not properly calibrated). > >> > >> To make a comparison, we run chromiumos v5.15 and dmesg output was: > >> 'This sample is not calibrated, fake !!' > >> Additional debugging revealed that the golden temp was actually 0. As a result, > >> chromiumos v5.15 does not use the calibration data. It uses some default values > >> instead. That's why you can observe good temperatures with chromiumos v5.15 > >> even with a device that is not calibrated. > >> > >> This feature is not implemented in the driver upstream, so you need a device > >> properly calibrated to get good temperatures with it. When we forced this > >> driver using the default values used by chromiumos v5.15 instead of real calib > >> data (temporarily, just for testing), the temperatures were good. > >> > >> Please make sure your device is properly calibrated: 0 < golden temp < 62. > >> > > > > Wait wait wait wait. > > > > What's up with that calibration data stuff? > > > > If there's any device that cannot use the calibration data, we need a way to > > recognize whether the provided data (read from efuse, of course) is valid, > > otherwise we're creating an important regression here. > > > > "This device is unlucky" is not a good reason to have this kind of regression. > > > > Since - as far as I understand - downstream can recognize that, upstream should > > do the same. > > I'd be okay with refusing to even probe this driver on such devices for the > > moment being, as those are things that could be eventually handled on a second > > part series, even though I would prefer a kind of on-the-fly calibration or > > anyway something that would still make the unlucky ones to actually have good > > readings *right now*. > > > > Though, the fact that you assert that you observed this behavior on one of your > > devices and *still decided to send that upstream* is, in my opinion, unacceptable. > > > > Regards, > > Angelo > > I've been trying to find some more information about the criteria > "device calibrated VS device not calibrated" because there's a > confusing comment in downstream code (the comment does not > match what I observe on my device). I'll send a separate patch > to add this feature over the next few days, when I get additional > information from MTK about this criteria. I couldn't wait and sent a patch to provide default calibration data, based on the values and code from the ChromeOS kernels. It seems to work OK-ish. I get 4x degrees C on my MT8186 device. Also, your previous patch blocking invalid efuse data has landed. So I think this series can be relanded. What do you think, Angelo? ChenYu