Am So., 12. Jan. 2020 um 21:08 Uhr schrieb Guenter Roeck <linux@xxxxxxxxxxxx>: > > On 1/12/20 10:37 AM, Gabriel C wrote: > > Am So., 12. Jan. 2020 um 16:26 Uhr schrieb Guenter Roeck <linux@xxxxxxxxxxxx>: > >> > >> On 1/12/20 5:45 AM, Gabriel C wrote: > >>> Am So., 12. Jan. 2020 um 14:07 Uhr schrieb Guenter Roeck <linux@xxxxxxxxxxxx>: > >>>> > >>>> On 1/12/20 4:07 AM, Linus Walleij wrote: > >>>>> On Sun, Jan 12, 2020 at 1:03 PM Gabriel C <nix.or.die@xxxxxxxxx> wrote: > >>>>>> Am So., 12. Jan. 2020 um 12:22 Uhr schrieb Linus Walleij > >>>>>> <linus.walleij@xxxxxxxxxx>: > >>>>>>> > >>>>>>> On Sun, Jan 12, 2020 at 12:18 PM Gabriel C <nix.or.die@xxxxxxxxx> wrote: > >>>>>>> > >>>>>>>> What I've noticed however is the nvme temperature low/high values on > >>>>>>>> the Sensors X are strange here. > >>>>>>> (...) > >>>>>>>> Sensor 1: +27.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>>> Sensor 2: +29.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>> (...) > >>>>>>>> Sensor 1: +23.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>>> Sensor 2: +25.9°C (low = -273.1°C, high = +65261.8°C) > >>>>>>> > >>>>>>> That doesn't look strange to me. It seems like reasonable defaults > >>>>>>> from the firmware if either it doesn't really log the min/max temperatures > >>>>>>> or hasn't been through a cycle of updating these yet. Just set both > >>>>>>> to absolute min/max temperatures possible. > >>>>>> > >>>>>> Ok I'll check that. > >>>>>> > >>>>>> Do you mean by setting the temperatures to use a lmsensors config? > >>>>>> Or is there a way to set these with a nvme command? > >>>>> > >>>>> Not that I know of. > >>>>> > >>>>> The min/max are the minumum and maximum temperatures the > >>>>> device has experienced during this power-on cycle. > >>>>> > >>>> > >>>> No, that would be lowest/highest. The above are (or should be) per-sensor > >>>> setpoints. The default for those is typically the absolute minimum / > >>>> maximum of the supported range. > >>>> > >>>> Some SATA drives report the lowest/highest temperatures experienced > >>>> since power cycle, like here. > >>>> > >>>> drivetemp-scsi-5-0 > >>>> Adapter: SCSI adapter > >>>> temp1: +23.0°C (low = +0.0°C, high = +60.0°C) > >>>> (crit low = -41.0°C, crit = +85.0°C) > >>>> (lowest = +20.0°C, highest = +31.0°C) > >>>> > >>> > >>> The SATA temperatures are fine and reported like this here too, just > >>> the nvme ones are strange. > >>> > >>> drivetemp-scsi-4-0 > >>> Adapter: SCSI adapter > >>> temp1: +28.0°C (low = +1.0°C, high = +61.0°C) > >>> (crit low = +2.0°C, crit = +60.0°C) > >>> (lowest = +16.0°C, highest = +31.0°C) > >>> > >>> drivetemp-scsi-12-0 > >>> Adapter: SCSI adapter > >>> temp1: +29.0°C (low = +1.0°C, high = +61.0°C) > >>> (crit low = +2.0°C, crit = +60.0°C) > >>> (lowest = +18.0°C, highest = +32.0°C) > >>> > >>> and so on. > >>> > >>> Btw, where I can find the code does these calculations? > >>> > >> > >> Not sure if that is what you are looking for, but the nvme hardware > >> monitoring driver is at drivers/nvme/host/hwmon.c, the SATA hardware > >> monitoring driver is at drivers/hwmon/drivetemp.c. > >> > > > > I have a look thanks. > > > > I'm using your v2 patch for the nvme part since you posted it on 5.4 kernels. > > This is probably why I find the way the temperatures are now reported > > very strange. > > > > The ADATA XPG SX8200 Pro in my laptop seems to work better: > > > > nvme-pci-0200 > > Adapter: PCI adapter > > Composite: +37.9°C (low = -0.1°C, high = +74.8°C) > > (crit = +79.8°C) > > > > Low is 0° which is what the spec suggests. > > > >> The limits on nvme drives are configurable. > > > > Yes, I found this out already. > > > >> root@server:/sys/class/hwmon# sensors nvme-pci-0100 > >> nvme-pci-0100 > >> Adapter: PCI adapter > >> Composite: +40.9°C (low = -273.1°C, high = +84.8°C) > >> (crit = +84.8°C) > >> Sensor 1: +40.9°C (low = -273.1°C, high = +65261.8°C) > >> Sensor 2: +43.9°C (low = -273.1°C, high = +65261.8°C) > >> > >> root@server:/sys/class/hwmon# echo 0 > hwmon1/temp2_min > >> root@server:/sys/class/hwmon# echo 100000 > hwmon1/temp2_max > > > > An lm-sensors configuration will work too. > > > Sure, the above was just an example. > > >> root@server:/sys/class/hwmon# sensors nvme-pci-0100 > >> nvme-pci-0100 > >> Adapter: PCI adapter > >> Composite: +38.9°C (low = -273.1°C, high = +84.8°C) > >> (crit = +84.8°C) > >> Sensor 1: +38.9°C (low = -0.1°C, high = +99.8°C) > >> Sensor 2: +42.9°C (low = -273.1°C, high = +65261.8°C) > >> > >> If you dislike the defaults, just configure whatever you think is > >> appropriate for your system. > > > > It's not about disliking the values. I want to find out if these Samsung models > > don't support that, or it is a bug somewhere in writing/calculating the values. > > > No, this is not a bug. It is perfectly valid for individual sensors to have > uninitialized limits. If I recall correctly, the NVME specification > specifically states that the default settings for individual sensors > shall be those values (0 and 65535 Kelvin, specifically). > > And, yes, I would agree that is a bit odd that NVME drives report temperatures > in Kelvin, but such is the world. > > > In the case, Samsung and others don't support such a thing wouldn't be > > better to just ignore > > the bogus reading altogether? > > Again, you can set whatever limits you like. The default limits on many > hardware sensor chips have odd values. Just looking at my system: > > nct6797-isa-0a20 > Adapter: ISA adapter > in0: +0.48 V (min = +0.00 V, max = +1.74 V) > in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM > in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM > in3: +3.31 V (min = +0.00 V, max = +0.00 V) ALARM > in4: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM > in5: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM > in6: +0.82 V (min = +0.00 V, max = +0.00 V) ALARM > in7: +3.38 V (min = +0.00 V, max = +0.00 V) ALARM > in8: +3.26 V (min = +0.00 V, max = +0.00 V) ALARM > in9: +1.82 V (min = +0.00 V, max = +0.00 V) ALARM > in10: +0.00 V (min = +0.00 V, max = +0.00 V) > in11: +0.74 V (min = +0.00 V, max = +0.00 V) ALARM > in12: +1.20 V (min = +0.00 V, max = +0.00 V) ALARM > in13: +0.68 V (min = +0.00 V, max = +0.00 V) ALARM > in14: +1.50 V (min = +0.00 V, max = +0.00 V) ALARM > > > Are you suggesting that we should not support setting min/max values for > all drivers just because they are often not initialized to reasonable values > by default ? No, I'm not suggesting that. I'm aware of strange I/O monitoring chips values and the lack of documentation, so in this case, something is better than nothing. In the nvme case, these are only 2 values who either are working/supported by firmware or not, so I thought it would be reasonable to have known-good values instead of 65261.8°C, which will probably cause users to report that as a bug a lot. Can we at least have that documented and explain how the values can be set/changed?