Re: Buggy BIOS on the HP TX2500-series

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday 09 October 2008 08:36:56 Zhang Rui wrote:
> On Wed, 2008-10-08 at 19:04 -0700, Len Brown wrote:
> > On Thu, 9 Oct 2008, Matthew Garrett wrote:
> > > On Thu, Oct 09, 2008 at 08:59:15AM +0800, Zhang Rui wrote:
> > > > On Wed, 2008-10-08 at 12:26 -0700, Matthew Garrett wrote:
> > > > > A patch went into the kernel earlier this year to ignore critical
> > > > > trip points that were below 0.
> > > >
> > > > well, I think this patch is wrong.
> > > > a critical trip point below 0 Celsius doesn't mean it's invalid.
> > >
> > > I think it's pretty clear that a critical trip point below 0 celsius
> > > means that the critical trip point is invalid,
>
> well, I agree that this workaround can work here.
> But I think the proper way to fix this issue is that,
> ACPICA returns some error code(AE_BAD_DATA) like it did before,
> and the thermal driver knows that it got an invalid value and
> should take it carefully.
> or else we may still get potential problems, e.g. what if there
> is no return value of _PSV method.
>
> > though I agree that
> >
> > > ignoring the entire thermal zone as a result is somewhat unfortunate.
> > >
> > > > windows can work well on this laptop.
> > > > please look at:
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=10686#c13
> > > > IMO, we need to fix the ACPICA code first of all.
> > > >
> > > > Ming, what do you think of the patch in comment #15 and #16?
> > >
> > > We could quibble over the technical correctness of this approach, but
> > > it seems to behave in exactly the same way - ie, Linux will ignore the
> > > thermal zone? The existing code seems fine, other than the fact that a
> > > bad _CRT will result everything failing. I think we'd be better off
> > > just losing the return -ENODEV there and try to use as much of the
> > > thermal information as we can.
> >
> > right, when we put in the workaround we observed that a bad _CRT
> > would delete an entire thermal zone, and that could be a big
> > problem on a box with active cooling on that thermal zone.
>
> Hmm, I think a patch like this is enough to fix this problem.
>
>
> ignore invalid critical trip point.
>
> Signed-off-by: Zhang Rui <rui.zhang@xxxxxxxxx>
> ---
>  drivers/acpi/thermal.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> Index: linux-2.6/drivers/acpi/thermal.c
> ===================================================================
> --- linux-2.6.orig/drivers/acpi/thermal.c
> +++ linux-2.6/drivers/acpi/thermal.c
> @@ -373,9 +373,8 @@ static int acpi_thermal_trips_update(str
>  		if (ACPI_FAILURE(status) ||
>  				tz->trips.critical.temperature <= 2732) {
>  			tz->trips.critical.flags.valid = 0;
> -			ACPI_EXCEPTION((AE_INFO, status,
> +			ACPI_DEBUG_PRINT((ACPI_DB_WARN,
>  					"No or invalid critical threshold"));
> -			return -ENODEV;
I agree with not returning -ENODEV which I expect to only invalidate the
critical trip point and still let the thermal zone do it's (possibly 
important) job.

I DO NOT AGREE with removing the exception!
These corner cases will always break again and again sooner or later. Totally 
hiding this message will cause a lot grief to people who are going to debug 
thermal issues on their machine two kernel versions later when some bad side 
effect exposes caused by this crappy BIOS and people will lose precious time, 
because this was hidden.

Also this BIOS should never get a Novell or whatever Linux distro certified 
sticker. Our certification people try hard to identify such BIOS bugs and 
reject such BIOSes and force the vendors to fix it. Guessing Windows behavior 
and try to support it is not an option, because history showed that such 
Windows compatibility efforts can take years until things are adopted/guessed 
in the right way.

I am going to resend my [Firmware Bug] interface using this one as a first 
instance to use it. It's the classic example why it's urgently needed.
Once it's in, I promise to adopt other parts, so that it's meaningful to do:
dmesg |grep "[Firmware Bug]"

Have you seen that Len?
Hmm, I forgot to add the acpi list on the latest version, I only added the 
cpufreq list. I am going to resend them. On the cpufreq list it was:
Subject: Resend: Introduce interface to report BIOS bugs (reworked: FW_BUG 
simple solution, large description)

Thanks,

     Thomas
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux