On 04/03/2013 22:41, Konrad Rzeszutek
Wilk wrote:
Pls CC me in case you would like me also to test them with the mdelay patch. Hi Konrad, Marcin proposed me another explanation for the issue you are seeing and it made me look again at the code. I don't have enough nv4x hw to test all the conditions but with the attached patches, you may get a saner behaviour than a computer that shut-downs whenever you turn it on (like a "most useless machine ever"). The most important patch is the 8th one. Please try applying them on top of your 3.9-rc1 kernel and send me back your kernel logs + sensors output. Cheers, Martin PS: The attached patches are parts of my current thermal-related queue. I'll post them soon to the list. - http://gitorious.org/linux-nouveau-pm/linux-nouveau-pm/commits/thermal |
>From e2a10f1e7060b0cbabb032590ec2588f952016fc Mon Sep 17 00:00:00 2001 From: Martin Peres <martin.peres@xxxxxxxx> Date: Tue, 5 Mar 2013 10:26:30 +0100 Subject: [PATCH 1/8] drm/nv40/therm: improve selection between the old and the new style The condition to select between the old and new style was a thinko as rnndb orders chipsets based on their release date (or general chronologie hw-wise) and not based on their chipset number. As the nv40 family is a mess when it comes to numbers, this patch introduces a switch-based selection between the old and new style. Signed-off-by: Martin Peres <martin.peres@xxxxxxxx> --- drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c | 50 ++++++++++++++++++------ 1 file changed, 38 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c index 0f5363e..9d9ecaa 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c @@ -29,42 +29,68 @@ struct nv40_therm_priv { struct nouveau_therm_priv base; }; +enum nv40_sensor_style { INVALID_STYLE = -1, OLD_STYLE = 0, NEW_STYLE = 1 }; + +static enum nv40_sensor_style +nv40_is_older_style_sensor(struct nouveau_therm *therm) +{ + struct nouveau_device *device = nv_device(therm); + + switch (device->chipset) { + case 0x43: + case 0x44: + case 0x4a: + case 0x47: + return OLD_STYLE; + + case 0x46: + case 0x49: + case 0x4b: + case 0x4e: + case 0x4c: + case 0x67: + case 0x68: + case 0x63: + return NEW_STYLE; + default: + return INVALID_STYLE; + } +} + static int nv40_sensor_setup(struct nouveau_therm *therm) { - struct nouveau_device *device = nv_device(therm); + enum nv40_sensor_style style = nv40_is_older_style_sensor(therm); /* enable ADC readout and disable the ALARM threshold */ - if (device->chipset >= 0x46) { + if (style == NEW_STYLE) { nv_mask(therm, 0x15b8, 0x80000000, 0); nv_wr32(therm, 0x15b0, 0x80003fff); mdelay(10); /* wait for the temperature to stabilize */ return nv_rd32(therm, 0x15b4) & 0x3fff; - } else { + } else if (style == OLD_STYLE) { nv_wr32(therm, 0x15b0, 0xff); return nv_rd32(therm, 0x15b4) & 0xff; - } + } else + return -ENODEV; } static int nv40_temp_get(struct nouveau_therm *therm) { struct nouveau_therm_priv *priv = (void *)therm; - struct nouveau_device *device = nv_device(therm); struct nvbios_therm_sensor *sensor = &priv->bios_sensor; + enum nv40_sensor_style style = nv40_is_older_style_sensor(therm); int core_temp; - if (device->chipset >= 0x46) { + if (style == NEW_STYLE) { nv_wr32(therm, 0x15b0, 0x80003fff); core_temp = nv_rd32(therm, 0x15b4) & 0x3fff; - } else { + } else if (style == OLD_STYLE) { nv_wr32(therm, 0x15b0, 0xff); core_temp = nv_rd32(therm, 0x15b4) & 0xff; - } - - /* Setup the sensor if the temperature is 0 */ - if (core_temp == 0) - core_temp = nv40_sensor_setup(therm); + } else + return -ENODEV; if (sensor->slope_div == 0) sensor->slope_div = 1; -- 1.8.1.5
>From 38893c70faa9bbedded908694c3344d755cb05bf Mon Sep 17 00:00:00 2001 From: Martin Peres <martin.peres@xxxxxxxx> Date: Tue, 5 Mar 2013 10:35:20 +0100 Subject: [PATCH 2/8] drm/nv40/therm: increase the sensor's settling delay to 20ms Based on my experience, 10ms wasn't always enough. Let's bump that to a little more. If this turns out to be insufficient-enough again, then an approach based on letting the sensor settle for several seconds before starting polling on the temperature would be better suited. This way, boot time wouldn't be impacted by those waits too much. Signed-off-by: Martin Peres <martin.peres@xxxxxxxx> --- drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c index 9d9ecaa..818060e 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c @@ -66,10 +66,11 @@ nv40_sensor_setup(struct nouveau_therm *therm) if (style == NEW_STYLE) { nv_mask(therm, 0x15b8, 0x80000000, 0); nv_wr32(therm, 0x15b0, 0x80003fff); - mdelay(10); /* wait for the temperature to stabilize */ + mdelay(20); /* wait for the temperature to stabilize */ return nv_rd32(therm, 0x15b4) & 0x3fff; } else if (style == OLD_STYLE) { nv_wr32(therm, 0x15b0, 0xff); + mdelay(20); /* wait for the temperature to stabilize */ return nv_rd32(therm, 0x15b4) & 0xff; } else return -ENODEV; -- 1.8.1.5
>From 9b9ab3da97590816b9c61e9d272bbcb712db8d42 Mon Sep 17 00:00:00 2001 From: Martin Peres <martin.peres@xxxxxxxx> Date: Tue, 5 Mar 2013 10:44:12 +0100 Subject: [PATCH 3/8] drm/nouveau/therm: do not make assumptions on temperature In nouveau_therm_sensor_event, temperature is stored as an uint8_t even though the original interface returns an int. This change should make it more obvious when the sensor is either very-ill-calibrated or when we selected the wrong sensor style on the nv40 family. Signed-off-by: Martin Peres <martin.peres@xxxxxxxx> --- drivers/gpu/drm/nouveau/core/subdev/therm/temp.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c index b37624a..0a17b95 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c @@ -106,16 +106,16 @@ void nouveau_therm_sensor_event(struct nouveau_therm *therm, const char *thresolds[] = { "fanboost", "downclock", "critical", "shutdown" }; - uint8_t temperature = therm->temp_get(therm); + int temperature = therm->temp_get(therm); if (thrs < 0 || thrs > 3) return; if (dir == NOUVEAU_THERM_THRS_FALLING) - nv_info(therm, "temperature (%u C) went below the '%s' threshold\n", + nv_info(therm, "temperature (%i C) went below the '%s' threshold\n", temperature, thresolds[thrs]); else - nv_info(therm, "temperature (%u C) hit the '%s' threshold\n", + nv_info(therm, "temperature (%i C) hit the '%s' threshold\n", temperature, thresolds[thrs]); active = (dir == NOUVEAU_THERM_THRS_RISING); -- 1.8.1.5
>From c277fbe506f32cf624d302eafa210746ed796b96 Mon Sep 17 00:00:00 2001 From: Martin Peres <martin.peres@xxxxxxxx> Date: Tue, 5 Mar 2013 10:58:59 +0100 Subject: [PATCH 4/8] drm/nouveau/therm: remove some confusion introduced by therm_mode The kernel message "[ PTHERM][0000:01:00.0] Thermal management: disabled" is misleading as it actually means "fan management: disabled". This patch fixes both the source and the message to improve readability. Signed-off-by: Martin Peres <martin.peres@xxxxxxxx> --- drivers/gpu/drm/nouveau/core/include/subdev/therm.h | 2 +- drivers/gpu/drm/nouveau/core/subdev/therm/base.c | 10 +++++----- drivers/gpu/drm/nouveau/core/subdev/therm/priv.h | 2 +- drivers/gpu/drm/nouveau/core/subdev/therm/temp.c | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/therm.h b/drivers/gpu/drm/nouveau/core/include/subdev/therm.h index 6b17b61..0b20fc0 100644 --- a/drivers/gpu/drm/nouveau/core/include/subdev/therm.h +++ b/drivers/gpu/drm/nouveau/core/include/subdev/therm.h @@ -4,7 +4,7 @@ #include <core/device.h> #include <core/subdev.h> -enum nouveau_therm_mode { +enum nouveau_therm_fan_mode { NOUVEAU_THERM_CTRL_NONE = 0, NOUVEAU_THERM_CTRL_MANUAL = 1, NOUVEAU_THERM_CTRL_AUTO = 2, diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/base.c b/drivers/gpu/drm/nouveau/core/subdev/therm/base.c index f794dc8..321a55b 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/base.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/base.c @@ -134,7 +134,7 @@ nouveau_therm_alarm(struct nouveau_alarm *alarm) } int -nouveau_therm_mode(struct nouveau_therm *therm, int mode) +nouveau_therm_fan_mode(struct nouveau_therm *therm, int mode) { struct nouveau_therm_priv *priv = (void *)therm; struct nouveau_device *device = nv_device(therm); @@ -152,7 +152,7 @@ nouveau_therm_mode(struct nouveau_therm *therm, int mode) if (priv->mode == mode) return 0; - nv_info(therm, "Thermal management: %s\n", name[mode]); + nv_info(therm, "fan management: %s\n", name[mode]); nouveau_therm_update(therm, mode); return 0; } @@ -213,7 +213,7 @@ nouveau_therm_attr_set(struct nouveau_therm *therm, priv->fan->bios.max_duty = value; return 0; case NOUVEAU_THERM_ATTR_FAN_MODE: - return nouveau_therm_mode(therm, value); + return nouveau_therm_fan_mode(therm, value); case NOUVEAU_THERM_ATTR_THRS_FAN_BOOST: priv->bios_sensor.thrs_fan_boost.temp = value; priv->sensor.program_alarms(therm); @@ -263,7 +263,7 @@ _nouveau_therm_init(struct nouveau_object *object) return ret; if (priv->suspend >= 0) - nouveau_therm_mode(therm, priv->mode); + nouveau_therm_fan_mode(therm, priv->mode); priv->sensor.program_alarms(therm); return 0; } @@ -317,7 +317,7 @@ nouveau_therm_preinit(struct nouveau_therm *therm) nouveau_therm_sensor_ctor(therm); nouveau_therm_fan_ctor(therm); - nouveau_therm_mode(therm, NOUVEAU_THERM_CTRL_NONE); + nouveau_therm_fan_mode(therm, NOUVEAU_THERM_CTRL_NONE); return 0; } diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/priv.h b/drivers/gpu/drm/nouveau/core/subdev/therm/priv.h index 06b9870..d8483dd 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/priv.h +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/priv.h @@ -102,7 +102,7 @@ struct nouveau_therm_priv { struct i2c_client *ic; }; -int nouveau_therm_mode(struct nouveau_therm *therm, int mode); +int nouveau_therm_fan_mode(struct nouveau_therm *therm, int mode); int nouveau_therm_attr_get(struct nouveau_therm *therm, enum nouveau_therm_attr_type type); int nouveau_therm_attr_set(struct nouveau_therm *therm, diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c index 0a17b95..441f60b 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/temp.c @@ -123,7 +123,7 @@ void nouveau_therm_sensor_event(struct nouveau_therm *therm, case NOUVEAU_THERM_THRS_FANBOOST: if (active) { nouveau_therm_fan_set(therm, true, 100); - nouveau_therm_mode(therm, NOUVEAU_THERM_CTRL_AUTO); + nouveau_therm_fan_mode(therm, NOUVEAU_THERM_CTRL_AUTO); } break; case NOUVEAU_THERM_THRS_DOWNCLOCK: -- 1.8.1.5
>From 60dce3447342d7bb1122e90c3f0aa63573e0a9b4 Mon Sep 17 00:00:00 2001 From: Martin Peres <martin.peres@xxxxxxxx> Date: Tue, 5 Mar 2013 10:38:37 +0100 Subject: [PATCH 8/8] drm/nv40/therm: <DO NOT PUSH> move nv4c to the newer temperature-reading style This is a guess made by joi and that may quite likely be true --- drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c index d546ada..2b24667 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c @@ -41,13 +41,13 @@ nv40_is_older_style_sensor(struct nouveau_therm *therm) case 0x44: case 0x4a: case 0x47: + case 0x4c: return OLD_STYLE; case 0x46: case 0x49: case 0x4b: case 0x4e: - case 0x4c: case 0x67: case 0x68: case 0x63: -- 1.8.1.5
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel