Re: [PATCH] nvme: hwmon: provide temperature min and max values for each sensor

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2019年11月12日(火) 2:35 Guenter Roeck <linux@xxxxxxxxxxxx>:
>
> On Tue, Nov 12, 2019 at 12:56:21AM +0900, Akinobu Mita wrote:
> > 2019年11月11日(月) 1:30 Guenter Roeck <linux@xxxxxxxxxxxx>:
> > >
> >
> [ ... ]
>
> > > > Example output from the "sensors" command:
> > > >
> > > > nvme-pci-0100
> > > > Adapter: PCI adapter
> > > > Composite:    +53.0 C  (low  = -273.0 C, high = +70.0 C)
> > > >                         (crit = +80.0 C)
> > > > Sensor 1:     +56.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > > Sensor 2:     +51.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > > Sensor 5:     +73.0 C  (low  = -273.0 C, high = +65262.0 C)
> > > >
> > >
> > > Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing
> > > any minimum limit on the Composite temperature sensor results in a temperature
> > > warning, and that warning is sticky until I reset the controller.
> > > I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others.
> >
> > I have Crucial CT500P1SSD8 and WDC WDS512G1X0C-00ENX0, and I have no
> > problem with these devices.
> >
> > > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> > > nvme-pci-0100
> > > Adapter: PCI adapter
> > > Composite:    +30.0°C  (low  = -273.0°C, high = +70.0°C)
> > >                         (crit = +80.0°C)
> > >
> > > root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min
> > > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100
> > > nvme-pci-0100
> > > Adapter: PCI adapter
> > > Composite:    +30.0°C  (low  =  +0.0°C, high = +70.0°C)  ALARM
> > >                         (crit = +80.0°C)
> > >
> > > It doesn't seem to matter which temperature I write; writing -273000 has
> > > the same result.
> > >
> > > [This is actually why I didn't use the features commands; not that I had observed
> > >   the problem, but I was concerned that problems like this would show up.]
> >
> > Maybe we should introduce a new quirk so that we can avoid changing
> > temperature threshold for such devices.  Could you tell SSDPEKKW512G7's
> > vendor and device ID?  Quick googling answers it's 8086:f1a5, but I want
> > to make sure.
>
> Yes, that is correct.
>
> 01:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a5] (rev 03)

OK.  I'll add NVME_QUIRK_NO_TEMP_THRESH_CHANGE and mark the device.

> I'll see if I can test this tonight on my other NVMEs. I also dug up an old
> NVMe drive from Toshiba; I'll see if I can connect and test it as well.

That's nice.

> [ ... ]
>
> > > >        */
> > > >       switch (attr) {
> > > >       case hwmon_temp_max:
> > > > -             *val = (data->ctrl->wctemp - 273) * 1000;
> > > > +             err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > > > +             if (err)
> > > > +                     *val = (data->ctrl->wctemp - 273) * 1000;
> > >
> > > This would report WCTEMP for all sensors on errors, including errors seen while
> > > the controller is resetting. I think it should be something like
> > >
> > >                 int err = 0;
> > >                 ...
> > >
> > >                 if (!channel)
> > >                         *val = (data->ctrl->wctemp - 273) * 1000;
> > >                 else
> > >                         err = nvme_get_temp_thresh(data->ctrl, channel, false, val);
> > >                 return err;
> > >
> > > assuming we keep using ctrl->wctemp (see below). If changing the upper Composite
> > > temperature sensor limit changes wctemp, and we don't update it, we should not
> > > use it at all after registration and just report the error.
> > >
> > > >               return 0;
> > > > +     case hwmon_temp_min:
> > > > +             return nvme_get_temp_thresh(data->ctrl, channel, true, val);
> > > >       case hwmon_temp_crit:
> > > >               *val = (data->ctrl->cctemp - 273) * 1000;
> > > >               return 0;
> > > > @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
> > > >       return err;
> > > >   }
> > > >
> > > > +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type,
> > > > +                         u32 attr, int channel, long val)
> > > > +{
> > > > +     struct nvme_hwmon_data *data = dev_get_drvdata(dev);
> > > > +
> > > > +     switch (attr) {
> > > > +     case hwmon_temp_max:
> > > > +             return nvme_set_temp_thresh(data->ctrl, channel, false, val);
> > >
> > > Does this change WCTEMP if written on channel 0 ? If so, we would have to update
> > > the cached value of ctrl->wctemp (or never use it after registration).
> >
> > At least for the devices I have, setting the over temperature threshold
> > doesn't change the WCTEMP.
> > I have checked with  'nvme id-ctrl /dev/nvme0 | grep ctemp'.
> >
>
> Interesting. I just tested this, and the result is the same with Samsung
> SSD 970 EVO. With that in mind, maybe we should really not use wctemp
> at all after initialization, as I had suggested above. What do you think ?

Yes. We only need to use the WCTEMP in nvme_hwmon_is_visible().




[Index of Archives]     [LM Sensors]     [Linux Sound]     [ALSA Users]     [ALSA Devel]     [Linux Audio Users]     [Linux Media]     [Kernel]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux