On Mon, Mar 04, 2013 at 08:21:48PM +0100, Martin Peres wrote: > Hi Konrad, > > On 04/03/2013 19:40, Konrad Rzeszutek Wilk wrote:> After git merge > ab7826595e9ec51a51f622c5fc91e2f59440481a > > (Merge tag 'mfd-3.9-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6) > > the nouveau driver ends up shutting of the machine when booting. > > > > > > I hadn't done a git bisection yet and was wondering if there are some > > juice commits I ought to look at? > > Sure, no need to bisect, it is a new (apparently-broken-for-you) feature. > > The code is in /drivers/gpu/drm/nouveau/core/subdev/therm/ > > > > > > Here is the serial console: > > > > [ 6.940628] nouveau [ PTHERM][0000:00:0d.0] Thermal > management: disabled > > [ 6.957474] nouveau [ PTHERM][0000:00:0d.0] programmed > thresholds [ 90(2), 95(3), 145(2), 135(5) ] > > [ 6.966594] nouveau 6.975100] nouveau [ > PTHERM][0000:00:0d.0] Thermal management: automatic > > [ 6.982059] nouveau [ PTHERM][0000:00:0d.0] temperature (88 > C) hit the 'downclock' threshold > > [ 6.990680] nouveau [ PTHERM][0000:00:0d.0] temperature (88 > C) hit the 'critical' threshold > > [ 6.999194] nouveau [ PTHERM][0000:00:0d.0] temperature (90 > C) hit the 'shutdown' threshold > > See, this is strange. If I believe the "programmed thresholds" line, > the fanboost threshold is at 90°C, downclock is at 95°C, critical > temperature is at 145°C and shutdown is at 135°C. > So, from the BIOS side, things seem to be in fairly good shape > (critical should be lower than shutdown, but that's OK). > > My theory is that your temperature sensor is very variable that > would set off the shutdown alarm. So, either the sensor needs more > settling time or the output is genuinely very variable. You should see it when I boot it under Xen: [ 8.427789] nouveau [ PTHERM][0000:00:0d.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ]^M^M [ 8.427855] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hit the 'fanboost' threshold^M^M [ 8.427919] nouveau [ PTHERM][0000:00:0d.0] Thermal management: automatic^M^M [ 8.427973] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hit the 'downclock' threshold^M^M [ 8.428036] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hit the 'critical' threshold^M^M [ 8.428099] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hit the 'shutdown' threshold^M^M > > In the first case, we could fix that by increasing the settling time > (at the expense of a longer boot period). We could also for a 10s > wait at boot time before reading temperature. > If this is the latter case, we only have the solution to average the > temperature on several samples. I would need statistics on the > variability in order to calculate a proper low-pass filter that > wouldn't be too slow or too RAM/wakeup-intensive. > > I really hope the problem is the settling time! > > > Here is what you can do to test the theory: > > Change the mdelay at line 41 of > /drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c (http://cgit.freedesktop.org/nouveau/linux-2.6/tree/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c#n41) > from 10 to 1000. > Please also add an mdelay of 1000 between lines 44 and 45. Let me do that tomorrow and report my findings. > > If it works with this patch, then try decreasing the delay to 20ms. > > In any way, I'll send some thermal patches tonight to be more > resistant to long settling times. Pls CC me in case you would like me also to test them with the mdelay patch. > > Thanks for reporting! Of course. > > Martin (mupuf) > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel