Re: [PATCH v2 2/2] hwmon: (coretemp) Add support for thermal threshold attributes

Jean Delvare <khali@xxxxxxxxxxxx> · Thu, 13 Oct 2011 16:31:31 +0200

Hi Guenter,

Sorry for the late reply...

On Tue, 27 Sep 2011 15:33:51 -0700, Guenter Roeck wrote:
> Add support for T0 and T1 temperature thresholds using the new sysfs ABI
> attributes tempX_thresholdY and tempX_thresholdY_triggered.
> 
> This patch is based on commit c814a4c7c4aad795835583344353963a0a673eb0, which
> was reverted. For details on the threshold registers, see IA Manual vol 3A,
> which can be downloaded from here:
>         http://download.intel.com/design/processor/manuals/253668.pdf
> 
> Signed-off-by: Guenter Roeck <guenter.roeck@xxxxxxxxxxxx>
> ---
>  Documentation/hwmon/coretemp |   12 +++
>  drivers/hwmon/coretemp.c     |  153 ++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 151 insertions(+), 14 deletions(-)

This works as designed, tested on both a Xeon E5520 and my Thinkpad
laptop with a Core T2600. On the Thinkpad, I can see the values change
in real time under load. The "triggered" files always read 0,
presumably because the BIOS resets the bits to 0 when changing the
threshold values.

Still, I am a little worried that:
* We are adding a new interface but have no idea who is supposed to use
  it (when the BIOS is not, that is) nor how. If user-space is supposed
  to set the threshold values, how does it decide what happens when
  they are crossed? Interrupts handlers would have to live in kernel
  space. Do we want to add support for these thresholds to libsensors?
  If so, how exactly, and how would applications (starting with
  "sensors") expose them?
* The coretemp implementation itself is strange, as we don't even know
  what kind of interrupt is generated on threshold crossing. My
  experiments with my laptop suggests these are SMI, but that may not
  always be the case, I just don't know. If they are SMI then it's up
  to the BIOS to decide, in which case we have no idea what the actions
  will be. If they are not SMI, then what are they?

Without answer to these questions, it doesn't seem reasonable to push
this upstream. And I am still unsure why this would be handled by hwmon
drivers rather than CPU power / thermal management code. Don't get me
wrong, I am not trying to push back the code. Simply I would like to
know where we are going before we accept it.

Meanwhile, I have some minor comments below if you are interested. It
might not be worth spending time on this until the above points are
answered though.

> diff --git a/Documentation/hwmon/coretemp b/Documentation/hwmon/coretemp
> index 84d46c0..d9f09fc 100644
> --- a/Documentation/hwmon/coretemp
> +++ b/Documentation/hwmon/coretemp
> @@ -35,6 +35,18 @@ the Out-Of-Spec bit. Following table summarizes the exported sysfs files:
>  All Sysfs entries are named with their core_id (represented here by 'X').
>  tempX_input	 - Core temperature (in millidegrees Celsius).
>  tempX_max	 - All cooling devices should be turned on (on Core2).
> +tempX_threshold1 - Reflects value of CPU thermal threshold T0.
> +tempX_threshold1_triggered
> +		 - A value of 1 indicates that T0 was crossed from either
> +		   direction. Reflects value of CPU thermal status register
> +		   bit 7 (THERM_LOG_THRESHOLD0).
> +		   Value is reset by writing 0 into the file.
> +tempX_threshold2 - Reflects value of CPU thermal threshold T1.
> +tempX_threshold2_triggered
> +		 - A value of 1 indicates that T1 was crossed from either
> +		   direction. Reflects value of CPU thermal status register
> +		   bit 9 (THERM_LOG_THRESHOLD1).
> +		   Value is reset by writing 0 into the file.

FWIW, the Intel datasheet numbers the thresholds #1 and #2. I have no
idea why Durgadoss decided to go with 0 and 1.

>  tempX_crit	 - Maximum junction temperature (in millidegrees Celsius).
>  tempX_crit_alarm - Set when Out-of-spec bit is set, never clears.
>  		   Correct CPU operation is no longer guaranteed.
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 44b2391..b0cb454 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -52,9 +52,10 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
>  
>  #define BASE_SYSFS_ATTR_NO	2	/* Sysfs Base attr no for coretemp */
>  #define NUM_REAL_CORES		16	/* Number of Real cores per cpu */
> -#define CORETEMP_NAME_LENGTH	17	/* String Length of attrs */
> -#define MAX_CORE_ATTRS		4	/* Maximum no of basic attrs */
> -#define TOTAL_ATTRS		(MAX_CORE_ATTRS + 1)
> +#define CORETEMP_NAME_LENGTH	33	/* String Length of attrs */

Curious where the 33 comes from... My own computations suggest 28
should be enough.

> +#define MAX_CORE_ATTRS		5	/* Maximum no of basic attrs */
> +#define MAX_THRESH_ATTRS	4	/* Maximum no of threshold attrs */
> +#define TOTAL_ATTRS		(MAX_CORE_ATTRS + MAX_THRESH_ATTRS)
>  #define MAX_CORE_DATA		(NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
>  
>  #ifdef CONFIG_SMP
> @@ -77,6 +78,8 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");

> (...)
> @@ -150,7 +154,7 @@ static ssize_t show_crit_alarm(struct device *dev,
>  static ssize_t show_tjmax(struct device *dev,
>  			struct device_attribute *devattr, char *buf)
>  {
> -	struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
> +	struct sensor_device_attribute_2 *attr = to_sensor_dev_attr_2(devattr);
>  	struct platform_data *pdata = dev_get_drvdata(dev);
>  
>  	return sprintf(buf, "%d\n", pdata->core_data[attr->index]->tjmax);
> @@ -159,12 +163,98 @@ static ssize_t show_tjmax(struct device *dev,
>  static ssize_t show_ttarget(struct device *dev,
>  				struct device_attribute *devattr, char *buf)
>  {
> -	struct sensor_device_attribute *attr = to_sensor_dev_attr(devattr);
> +	struct sensor_device_attribute_2 *attr = to_sensor_dev_attr_2(devattr);
>  	struct platform_data *pdata = dev_get_drvdata(dev);
>  
>  	return sprintf(buf, "%d\n", pdata->core_data[attr->index]->ttarget);
>  }
>  
> +static const u32 thresh_log_mask[2] = {
> +		THERM_LOG_THRESHOLD0, THERM_LOG_THRESHOLD1 };
> +
> +static ssize_t show_thresh_triggered(struct device *dev,
> +				     struct device_attribute *devattr,
> +				     char *buf)
> +{
> +	struct sensor_device_attribute_2 *attr = to_sensor_dev_attr_2(devattr);
> +	struct platform_data *pdata = dev_get_drvdata(dev);
> +	struct temp_data *tdata = pdata->core_data[attr->index];
> +	u32 eax, edx;
> +
> +	rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
> +
> +	return sprintf(buf, "%d\n", !!(eax & thresh_log_mask[attr->nr]));
> +}
> +
> +static ssize_t store_thresh_triggered(struct device *dev,
> +				      struct device_attribute *devattr,
> +				      const char *buf, size_t count)
> +{
> +	struct platform_data *pdata = dev_get_drvdata(dev);
> +	struct sensor_device_attribute_2 *attr = to_sensor_dev_attr_2(devattr);

Nitpicking: for consistency with the other callback functions, I'd put
this line first (same for the two following functions.)

> +	struct temp_data *tdata = pdata->core_data[attr->index];
> +	u32 mask = thresh_log_mask[attr->nr];
> +	u32 eax, edx;
> +	unsigned long val;
> +
> +	if (strict_strtoul(buf, 10, &val) || val != 0)
> +		return -EINVAL;
> +
> +	rdmsr_on_cpu(tdata->cpu, tdata->status_reg, &eax, &edx);
> +	if (eax & mask)

I wouldn't bother with this test. If the user (root) asks for a write,
just write. The user isn't supposed to write 0 to this file randomly,
so odds are that the test will always succeed. As a matter of fact, you
don't do any equality test when writing the threshold value itself.

> +		wrmsr_on_cpu(tdata->cpu, tdata->status_reg, eax & ~mask, edx);
> +	return count;
> +}
> +
> +static const u32 thresh_mask[2] = {
> +		THERM_MASK_THRESHOLD0, THERM_MASK_THRESHOLD1 };
> +static const u32 thresh_shift[2] = {
> +		THERM_SHIFT_THRESHOLD0, THERM_SHIFT_THRESHOLD1 };

u32 for a shift value seems odd, unsigned int would be more natural, or
u8 if you really want to save space (but I don't think it makes much
sense here.)

> +static ssize_t show_thresh(struct device *dev,
> +			   struct device_attribute *devattr, char *buf)
> +{
> +	struct platform_data *pdata = dev_get_drvdata(dev);
> +	struct sensor_device_attribute_2 *attr = to_sensor_dev_attr_2(devattr);
> +	struct temp_data *tdata = pdata->core_data[attr->index];
> +	u32 eax, edx;
> +	int thresh;
> +	u32 mask = thresh_mask[attr->nr];
> +	u32 shift = thresh_shift[attr->nr];

Nitpicking: ordering here (and below) is different from what you chose
for store_thresh_triggered. Code is easier to read when order is
consistent.

> +
> +	rdmsr_on_cpu(tdata->cpu, tdata->intrpt_reg, &eax, &edx);
> +	thresh = tdata->tjmax - ((eax & mask) >> shift) * 1000;
> +	return sprintf(buf, "%d\n", thresh);
> +}
> +
> +static ssize_t store_thresh(struct device *dev,
> +			    struct device_attribute *devattr,
> +			    const char *buf, size_t count)
> +{
> +	struct platform_data *pdata = dev_get_drvdata(dev);
> +	struct sensor_device_attribute_2 *attr = to_sensor_dev_attr_2(devattr);
> +	struct temp_data *tdata = pdata->core_data[attr->index];
> +	u32 eax, edx;
> +	long val, diff;
> +	u32 mask = thresh_mask[attr->nr];
> +	u32 shift = thresh_shift[attr->nr];
> +
> +	if (strict_strtol(buf, 10, &val))
> +		return -EINVAL;
> +
> +	diff = (tdata->tjmax - val) / 1000;
> +	if (diff < 0 || diff > 127)
> +		return -EINVAL;
> +
> +	mutex_lock(&tdata->update_lock);
> +	rdmsr_on_cpu(tdata->cpu, tdata->intrpt_reg, &eax, &edx);
> +	eax = (eax & ~mask) | (diff << shift);
> +	wrmsr_on_cpu(tdata->cpu, tdata->intrpt_reg, eax, edx);
> +	mutex_unlock(&tdata->update_lock);

Why do you need the mutex here, and not in store_thresh_triggered()?

> +
> +	return count;
> +}
> (...)

-- 
Jean Delvare

_______________________________________________
lm-sensors mailing list
lm-sensors@xxxxxxxxxxxxxx
http://lists.lm-sensors.org/mailman/listinfo/lm-sensors