2019年11月11日(月) 1:30 Guenter Roeck <linux@xxxxxxxxxxxx>: > > On 11/10/19 6:17 AM, Akinobu Mita wrote: > > According to the NVMe specification, the over temperature threshold and > > under temperature threshold features shall be implemented for Composite > > Temperature if a non-zero WCTEMP field value is reported in the Identify > > Controller data structure. The features are also implemented for all > > implemented temperature sensors (i.e., all Temperature Sensor fields that > > report a non-zero value). > > > > This provides the over temperature threshold and under temperature > > threshold for each sensor as temperature min and max values of hwmon > > sysfs attributes. > > > > The WCTEMP is already provided as a temperature max value for Composite > > Temperature, but this change isn't incompatible. Because the default > > value of the over temperature threshold for Composite Temperature is > > the WCTEMP. > > > > This also provides alarm attributes for each temperature sensor. But all > > alarm conditions are same, because there is only a single bit in > > Critical Warning field that indicates one of the temperature is outside of > > a temperature threshold. > > > > I think it would be more appropriate to report the alarm only for the > composite temperature, reason being that we don't really know which individual > sensor it is associated with. OK. > > Example output from the "sensors" command: > > > > nvme-pci-0100 > > Adapter: PCI adapter > > Composite: +53.0 C (low = -273.0 C, high = +70.0 C) > > (crit = +80.0 C) > > Sensor 1: +56.0 C (low = -273.0 C, high = +65262.0 C) > > Sensor 2: +51.0 C (low = -273.0 C, high = +65262.0 C) > > Sensor 5: +73.0 C (low = -273.0 C, high = +65262.0 C) > > > > Have you tried writing the limits ? On my Intel NVME drive (SSDPEKKW512G7), writing > any minimum limit on the Composite temperature sensor results in a temperature > warning, and that warning is sticky until I reset the controller. > I don't see that problem on Samsung SSD 970 EVO 500GB; I have not yet tried others. I have Crucial CT500P1SSD8 and WDC WDS512G1X0C-00ENX0, and I have no problem with these devices. > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100 > nvme-pci-0100 > Adapter: PCI adapter > Composite: +30.0°C (low = -273.0°C, high = +70.0°C) > (crit = +80.0°C) > > root@jupiter:/sys/class/hwmon/hwmon0# echo 0 > temp1_min > root@jupiter:/sys/class/hwmon/hwmon0# sensors nvme-pci-0100 > nvme-pci-0100 > Adapter: PCI adapter > Composite: +30.0°C (low = +0.0°C, high = +70.0°C) ALARM > (crit = +80.0°C) > > It doesn't seem to matter which temperature I write; writing -273000 has > the same result. > > [This is actually why I didn't use the features commands; not that I had observed > the problem, but I was concerned that problems like this would show up.] Maybe we should introduce a new quirk so that we can avoid changing temperature threshold for such devices. Could you tell SSDPEKKW512G7's vendor and device ID? Quick googling answers it's 8086:f1a5, but I want to make sure. > > Cc: Keith Busch <kbusch@xxxxxxxxxx> > > Cc: Jens Axboe <axboe@xxxxxx> > > Cc: Christoph Hellwig <hch@xxxxxx> > > Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> > > Cc: Jean Delvare <jdelvare@xxxxxxxx> > > Cc: Guenter Roeck <linux@xxxxxxxxxxxx> > > Signed-off-by: Akinobu Mita <akinobu.mita@xxxxxxxxx> > > --- > > This patch depends on the patch "nvme: Add hardware monitoring support" [1] > > [1] http://lists.infradead.org/pipermail/linux-nvme/2019-November/027883.html > > > > drivers/nvme/host/nvme-hwmon.c | 98 ++++++++++++++++++++++++++++++++++++------ > > include/linux/nvme.h | 6 +++ > > 2 files changed, 90 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c > > index 5480cbb..79323b2 100644 > > --- a/drivers/nvme/host/nvme-hwmon.c > > +++ b/drivers/nvme/host/nvme-hwmon.c > > @@ -15,6 +15,46 @@ struct nvme_hwmon_data { > > struct mutex read_lock; > > }; > > > > +static int nvme_get_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under, > > + long *temp) > > +{ > > + unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT; > > + int status; > > + int ret; > > + > > + if (under) > > + threshold |= NVME_TEMP_THRESH_TYPE_UNDER; > > + > > + ret = nvme_get_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0, > > + &status); > > + if (!ret) > > + *temp = ((status & NVME_TEMP_THRESH_MASK) - 273) * 1000; > > + > > + return ret <= 0 ? ret : -EIO; > > +} > > + > > +static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under, > > + long temp) > > +{ > > + unsigned int threshold = sensor << NVME_TEMP_THRESH_SELECT_SHIFT; > > + int status; > > + int ret; > > + > > + temp = temp / 1000 + 273; > > + if (temp > NVME_TEMP_THRESH_MASK) > > + return -EINVAL; > > + > > Traditionally we use clamp_val() in hwmon drivers to adjust value ranges > for limit attributes, reason being that we can't expect userspace to dig > through per-sensor-type documentation to identify valid limits. Also, note > that the above does not handle negative values well (-274000 -> -274 -> -1). > I would suggest something like > > temp = temp / 1000 + 273; > temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK); > > or, if you want to be fancy; > > temp = DIV_ROUND_CLOSEST(temp, 1000) - 273; > temp = clamp_val(temp, 0, NVME_TEMP_THRESH_MASK); Either way looks good. > > + threshold |= temp; > > + > > + if (under) > > + threshold |= NVME_TEMP_THRESH_TYPE_UNDER; > > + > > + ret = nvme_set_features(ctrl, NVME_FEAT_TEMP_THRESH, threshold, NULL, 0, > > + &status); > > I am a bit baffled here. The last parameter of nvme_set_features() (and nvme_get_features) > is a pointer to u32, but status is declared as int. I would have assumed this generates > a compiler warning, but it doesn't, at least not with my version of gcc. > > Either case, it might be better to declare status as u32 (unless I did not have enough > coffee and I am missing something). > > Also, I assume that the returned status value is irrelevant. I don't find useful > information in the specification, but I may be missing it. You are right. I'll change the last parameter of nvme_set_features() with NULL. > > + > > + return ret <= 0 ? ret : -EIO; > > +} > > + > > static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data) > > { > > int ret; > > @@ -39,8 +79,12 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type, > > */ > > switch (attr) { > > case hwmon_temp_max: > > - *val = (data->ctrl->wctemp - 273) * 1000; > > + err = nvme_get_temp_thresh(data->ctrl, channel, false, val); > > + if (err) > > + *val = (data->ctrl->wctemp - 273) * 1000; > > This would report WCTEMP for all sensors on errors, including errors seen while > the controller is resetting. I think it should be something like > > int err = 0; > ... > > if (!channel) > *val = (data->ctrl->wctemp - 273) * 1000; > else > err = nvme_get_temp_thresh(data->ctrl, channel, false, val); > return err; > > assuming we keep using ctrl->wctemp (see below). If changing the upper Composite > temperature sensor limit changes wctemp, and we don't update it, we should not > use it at all after registration and just report the error. > > > return 0; > > + case hwmon_temp_min: > > + return nvme_get_temp_thresh(data->ctrl, channel, true, val); > > case hwmon_temp_crit: > > *val = (data->ctrl->cctemp - 273) * 1000; > > return 0; > > @@ -73,6 +117,23 @@ static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type, > > return err; > > } > > > > +static int nvme_hwmon_write(struct device *dev, enum hwmon_sensor_types type, > > + u32 attr, int channel, long val) > > +{ > > + struct nvme_hwmon_data *data = dev_get_drvdata(dev); > > + > > + switch (attr) { > > + case hwmon_temp_max: > > + return nvme_set_temp_thresh(data->ctrl, channel, false, val); > > Does this change WCTEMP if written on channel 0 ? If so, we would have to update > the cached value of ctrl->wctemp (or never use it after registration). At least for the devices I have, setting the over temperature threshold doesn't change the WCTEMP. I have checked with 'nvme id-ctrl /dev/nvme0 | grep ctemp'. > > + case hwmon_temp_min: > > + return nvme_set_temp_thresh(data->ctrl, channel, true, val); > > + default: > > + break; > > + } > > + > > + return -EOPNOTSUPP; > > +} > > + > > static const char * const nvme_hwmon_sensor_names[] = { > > "Composite", > > "Sensor 1", > > @@ -105,13 +166,13 @@ static umode_t nvme_hwmon_is_visible(const void *_data, > > return 0444; > > break; > > case hwmon_temp_max: > > + case hwmon_temp_min: > > if (!channel && data->ctrl->wctemp) > > - return 0444; > > + return 0644; > > + else if (data->log.temp_sensor[channel - 1]) > > + return 0644; > > This ends up with a negative index into data->log.temp_sensor > if data->ctrl->wctemp == 0. It needs to be Oops. > else if (channel && data->log.temp_sensor[channel - 1]) > It can also be written as a single conditional since the return value is the same. Sounds good.