Re: [PATCH] BIOS SATA legacy mode failure

Aaron Lu <aaron.lu@xxxxxxxxx> · Tue, 22 Oct 2013 10:12:26 +0800

On 10/22/2013 09:34 AM, Robert Hancock wrote:
> On 10/16/2013 08:42 AM, Levente Kurusa wrote:
>> 2013-10-16 02:16 keltezéssel, Robert Hancock írta:
>>> On Sun, Oct 13, 2013 at 6:02 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>> 2013-10-13 07:57 keltezéssel, Robert Hancock írta:
>>>>> On Sat, Oct 12, 2013 at 3:29 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>> 2013-10-12 04:06 keltezéssel, Robert Hancock írta:
>>>>>>> On Fri, Oct 11, 2013 at 10:07 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>> 2013-10-01 06:25 keltezéssel, Robert Hancock írta:
>>>>>>>>> On Sat, Sep 28, 2013 at 7:21 PM, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>>>>>>>>>> On Sat, Sep 28, 2013 at 11:46 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>>>> 2013-09-28 06:55 keltezéssel, Robert Hancock írta:
>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Sep 27, 2013 at 7:24 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-09-25 08:31 keltezéssel, Robert Hancock írta:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Sep 22, 2013 at 1:13 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2013-09-21 19:04 keltezéssel, Robert Hancock írta:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Sep 21, 2013 at 1:35 AM, Levente Kurusa <levex@xxxxxxxxx>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The following dmesg is stuck in an infinite loop.
>>>>>>>>>>>>>>>>>>>>>>> dmesg:
>>>>>>>>>>>>>>>>>>>>>>> ata3: lost interrupt (Status 0x50)
>>>>>>>>>>>>>>>>>>>>>>> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>>>>>>>>>>>>>>>>>>>>>>> frozen
>>>>>>>>>>>>>>>>>>>>>>> ata3.00: failed command: READ DMA
>>>>>>>>>>>>>>>>>>>>>>> ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096
>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>                     res 40/00:00:00:00:00/00:00:00:00:00/00
>>>>>>>>>>>>>>>>>>>>>>> Emask
>>>>>>>>>>>>>>>>>>>>>>> 0x4
>>>>>>>>>>>>>>>>>>>>>>> (timeout)
>>>>>>>>>>>>>>>>>>>>>>> ata3.00: status: { DRDY }
>>>>>>>>>>>>>>>>>>>>>>> ata3: soft resetting link
>>>>>>>>>>>>>>>>>>>>>>> ata3.00: configured for UDMA/33 (no error)
>>>>>>>>>>>>>>>>>>>>>>> ata3.00: device reported invalid CHS sector 0
>>>>>>>>>>>>>>>>>>>>>>> ata3: EH complete
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Patch that fixes the infinite loop:
>>>>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>>>>>>>>>>>>>>>>>>>>>> index f9476fb..eeedf80 100644
>>>>>>>>>>>>>>>>>>>>>>> --- a/drivers/ata/libata-eh.c
>>>>>>>>>>>>>>>>>>>>>>> +++ b/drivers/ata/libata-eh.c
>>>>>>>>>>>>>>>>>>>>>>> @@ -2437,6 +2437,14 @@ static void ata_eh_link_report(struct
>>>>>>>>>>>>>>>>>>>>>>> ata_link
>>>>>>>>>>>>>>>>>>>>>>> *link)
>>>>>>>>>>>>>>>>>>>>>>>                                   ehc->i.action, frozen,
>>>>>>>>>>>>>>>>>>>>>>> tries_buf);
>>>>>>>>>>>>>>>>>>>>>>>                       if (desc)
>>>>>>>>>>>>>>>>>>>>>>>                               ata_dev_err(ehc->i.dev, "%s\n",
>>>>>>>>>>>>>>>>>>>>>>> desc);
>>>>>>>>>>>>>>>>>>>>>>> +               ehc->i.dev->exce_cnt ++;
>>>>>>>>>>>>>>>>>>>>>>> +               ata_dev_warn(ehc->i.dev, "Number of exceptions:
>>>>>>>>>>>>>>>>>>>>>>> %d\n",
>>>>>>>>>>>>>>>>>>>>>>> ehc->i.dev->exce_cnt);
>>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>>> +                  * The device is failing terribly,
>>>>>>>>>>>>>>>>>>>>>>> +                 * disable it to prevent damage.
>>>>>>>>>>>>>>>>>>>>>>> +                 */
>>>>>>>>>>>>>>>>>>>>>>> +               if(ehc->i.dev->exce_cnt > 2)
>>>>>>>>>>>>>>>>>>>>>>> +                       ata_dev_disable(ehc->i.dev);
>>>>>>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>>>>>>>                       ata_link_err(link, "exception Emask 0x%x
>>>>>>>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>>>>>>>>                                    "SAct 0x%x SErr 0x%x action
>>>>>>>>>>>>>>>>>>>>>>> 0x%x%s%s\n",
>>>>>>>>>>>>>>>>>>>>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>>>>>>>>>>>>>>>>>>>>>> index eae7a05..fa52ee6 100644
>>>>>>>>>>>>>>>>>>>>>>> --- a/include/linux/libata.h
>>>>>>>>>>>>>>>>>>>>>>> +++ b/include/linux/libata.h
>>>>>>>>>>>>>>>>>>>>>>> @@ -660,7 +660,8 @@ struct ata_device {
>>>>>>>>>>>>>>>>>>>>>>>               u8
>>>>>>>>>>>>>>>>>>>>>>> devslp_timing[ATA_LOG_DEVSLP_SIZE];
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>               /* error history */
>>>>>>>>>>>>>>>>>>>>>>> -       int                     spdn_cnt;
>>>>>>>>>>>>>>>>>>>>>>> +       int                     spdn_cnt; /* Number of
>>>>>>>>>>>>>>>>>>>>>>> speed_downs
>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>> +       int                     exce_cnt; /* Number of
>>>>>>>>>>>>>>>>>>>>>>> exceptions
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> happenned */
>>>>>>>>>>>>>>>>>>>>>>>               /* ering is CLEAR_END, read comment above
>>>>>>>>>>>>>>>>>>>>>>> CLEAR_END
>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>               struct ata_ering        ering;
>>>>>>>>>>>>>>>>>>>>>>>        };
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This doesn't seem like a very good fix. It may prevent the
>>>>>>>>>>>>>>>>>>>>>> apparent
>>>>>>>>>>>>>>>>>>>>>> infinite loop but will just prevent that device from functioning
>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>>>>> It would be better if we could figure out what was actually
>>>>>>>>>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>>>>>>> wrong.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have tested the problem with three different computers, all
>>>>>>>>>>>>>>>>>>>>> switched
>>>>>>>>>>>>>>>>>>>>> to legacy/IDE/compatibility mode, and they didn't have this
>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>> Of
>>>>>>>>>>>>>>>>>>>>> course, they could have been set to AHCI mode, and there the
>>>>>>>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>> boot normally. Feels strange, but so far I was only able to
>>>>>>>>>>>>>>>>>>>>> reproduce
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> problem with a Toshiba MK8052GSX. On the topic of my patch, I
>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>> see why a device which fails so terribly that it reports 3
>>>>>>>>>>>>>>>>>>>>> exceptions
>>>>>>>>>>>>>>>>>>>>> shouldn't be disabled. Like in this case, it could cause infinite
>>>>>>>>>>>>>>>>>>>>> loops.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The problem is that this could happen in some cases when you
>>>>>>>>>>>>>>>>>>>> wouldn't
>>>>>>>>>>>>>>>>>>>> want to disable the device, like an error that just happens
>>>>>>>>>>>>>>>>>>>> sporadically and works on retry, or a device you're trying to
>>>>>>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>>>>>>> data from.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think if I edit the patch in a way, that when an
>>>>>>>>>>>>>>>>>>> operation
>>>>>>>>>>>>>>>>>>> successfully completes, it resets exce_cnt to zero. Might as well
>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> module_param, which can set the maximum value of exce_cnt, while
>>>>>>>>>>>>>>>>>>> having
>>>>>>>>>>>>>>>>>>> zero
>>>>>>>>>>>>>>>>>>> as an option to never disable the device. Please don't think me
>>>>>>>>>>>>>>>>>>> wrong,
>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>> don't want to force this patch, I just want to learn how all this
>>>>>>>>>>>>>>>>>>> works,
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> in the process try to make it better. :-)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> That would be better, but I think you're still going to have an
>>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>> with what magic number to pick to avoid disabling devices
>>>>>>>>>>>>>>>>>> inappropriately.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Conceptually, disabling the device doesn't really make sense anyway.
>>>>>>>>>>>>>>>>>> If someone in userspace wants to keep trying to read from that
>>>>>>>>>>>>>>>>>> device,
>>>>>>>>>>>>>>>>>> why would you stop them because of some arbitrary judgement? The
>>>>>>>>>>>>>>>>>> kernel itself isn't "locked up" during this process, anything not
>>>>>>>>>>>>>>>>>> blocked on I/O to that device should be able to continue running, so
>>>>>>>>>>>>>>>>>> that process is only hurting itself. If the system fails to boot
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> another device due to this, this would likely point out some kind of
>>>>>>>>>>>>>>>>>> problem in userspace or the distro boot process being overly
>>>>>>>>>>>>>>>>>> serialized.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have been booting up with the initramfs from ubuntu 13.04,
>>>>>>>>>>>>>>>>> and I have also tried to boot with the ubuntu install cd. They
>>>>>>>>>>>>>>>>> couldn't
>>>>>>>>>>>>>>>>> continue the boot process. I'm gonna spend the weekend trying to
>>>>>>>>>>>>>>>>> figure
>>>>>>>>>>>>>>>>> out where and why the interrupts don't happen. Whether it be a
>>>>>>>>>>>>>>>>> routing
>>>>>>>>>>>>>>>>> or a hardware issue, which I highly doubt due to the fact that
>>>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>>>> XP SP2 was able to boot up without errors.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are you able to get out full dmesg output from a boot attempt and the
>>>>>>>>>>>>>>>> contents of /proc/interrupts?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I said before, I am not able to get to the shell, without my
>>>>>>>>>>>>>>> 'symptom
>>>>>>>>>>>>>>> cure'. With my patch I get the following dmesg output, with
>>>>>>>>>>>>>>> some of my debug messages turned off:
>>>>>>>>>>>>>>> http://pastebin.com/5eb5G3Dx
>>>>>>>>>>>>>>> /proc/interrupts is here:
>>>>>>>>>>>>>>> http://pastebin.com/84CJey2D
>>>>>>>>>>>>>>> After yesterday's research, I have come to ata_piix.c . That file looks
>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>> the real culprit, as my netbook's controller is an Intel ICH7M one,
>>>>>>>>>>>>>>> The values I am getting from the device are very different than those
>>>>>>>>>>>>>>> that are expected.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Things I have noticed, but ignored in dmesg:
>>>>>>>>>>>>>>> There is a stack dump, because nobody cared about IRQ#20. I have
>>>>>>>>>>>>>>> ignored
>>>>>>>>>>>>>>> this because it is the EHCI IRQ, and I suppose it has nothing to do
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> ata. The problem is with ata3 or /dev/sdc, while the IRQ happens
>>>>>>>>>>>>>>> with /dev/sda, which works fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think it is likely related to the problem. The kernel thinks this
>>>>>>>>>>>>>> controller is on IRQ 16, but apparently something is raising
>>>>>>>>>>>>>> un-acknowledged interrupts on IRQ 20 and nothing is coming in on IRQ
>>>>>>>>>>>>>> 16. It seems quite likely that this is actually the ATA controller.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You mentioned that Windows XP was able to work in this mode. I wonder
>>>>>>>>>>>>>> if it was using the IOAPIC, as if not then the IRQ routing is
>>>>>>>>>>>>>> different which might mask the problem. Do you know what IRQ Device
>>>>>>>>>>>>>> Manager reported for this controller in Windows? And was it using any
>>>>>>>>>>>>>> IRQs over 15 (which would indicate the IOAPIC was in use)?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hmm, according to WinXP's Device manager for this controller,
>>>>>>>>>>>>> it listens to IRQ# 20, and therefore it is using the I/O APIC.
>>>>>>>>>>>>> Now, one question remains where is the error that mismaps
>>>>>>>>>>>>> controller?
>>>>>>>>>>>>> I have created a simple patch which seems to fix this:
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> @@ -1704,6 +1767,8 @@ static int piix_init_one(struct pci_dev *pdev,
>>>>>>>>>>>>> const
>>>>>>>>>>>>> struct pci_device_id *ent)
>>>>>>>>>>>>>                  hpriv->map = piix_init_sata_map(pdev, port_info,
>>>>>>>>>>>>>
>>>>>>>>>>>>> piix_map_db_table[ent->driver_data]);
>>>>>>>>>>>>>
>>>>>>>>>>>>> +       if(pdev->vendor == 0x8086 && pdev->device == 0x27C4)
>>>>>>>>>>>>> +               pdev->irq = 20;
>>>>>>>>>>>>>          rc = ata_pci_bmdma_prepare_host(pdev, ppi, &host);
>>>>>>>>>>>>>          if (rc)
>>>>>>>>>>>>>                  return rc;
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, I am more than sure that this is not the way
>>>>>>>>>>>>> to solve this problem. Do you have any idea on where
>>>>>>>>>>>>> the ideal place would be to implement a fix?
>>>>>>>>>>>>> According to specs of ICH7M, which is essentially the
>>>>>>>>>>>>> same as ICH6M, we need to check on what interrupt pin
>>>>>>>>>>>>> is the SATA controller, and after that check which IRQ line
>>>>>>>>>>>>> is connected to the I/O APIC and decide the IRQ's number
>>>>>>>>>>>>> on those findings.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Specs of ICH7:
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://www.intel.com/content/dam/doc/datasheet/i-o-controller-hub-7-datasheet.pdf
>>>>>>>>>>>>> Device 31 Interrupt Route Register: Chapter 7.1.46
>>>>>>>>>>>>> Device 31 Interrupt Pin Register: Chapter 7.1.41
>>>>>>>>>>>>>
>>>>>>>>>>>>> The SATA controller is always Device 31.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> It would appear that something is messing up with the ACPI IRQ routing
>>>>>>>>>>>> on this machine that's causing us to think the controller is on the
>>>>>>>>>>>> wrong IRQ. CCing the linux-acpi list to see if anyone has some
>>>>>>>>>>>> additional debugging suggestions. I suspect that dumping the DSDT is
>>>>>>>>>>>> likely the first step though. If you can get IASL installed, you can
>>>>>>>>>>>> do something like:
>>>>>>>>>>>>
>>>>>>>>>>>> cat /sys/firmware/acpi/tables/DSDT > dsdt.aml
>>>>>>>>>>>> iasl -d dsdt.aml
>>>>>>>>>>>>
>>>>>>>>>>>> That should spit out a dsdt.dsl file which would hopefully have the
>>>>>>>>>>>> info needed to figure out what's going on.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is the disassembled DSDT table:
>>>>>>>>>>> http://pastebin.com/LWNVht9H
>>>>>>>>>>> The SATA controller is at line 5206.
>>>>>>>>>>> I also disassembled the SSDT, but nothing interesting was there:
>>>>>>>>>>> http://pastebin.com/fus5sxU8
>>>>>>>>>>>
>>>>>>>>>>> I disabled the usage of ACPI for IRQs with acpi=noirq,
>>>>>>>>>>> and it successfully booted up setting itself to IRQ#3.
>>>>>>>>>>> This makes me think that this is the BIOS's fault.
>>>>>>>>>>> I think it would be possible to create a DMI check
>>>>>>>>>>> and forcibly set the irq to 20 if the DMI matches.
>>>>>>>>>>> Any comments on this?
>>>>>>>>>>
>>>>>>>>>> The BIOS may be doing something funky, but since Windows apparently
>>>>>>>>>> can figure out it's on IRQ 20, Linux presumably should be able to as
>>>>>>>>>> well. DMI checks should be the last resort - Windows almost certainly
>>>>>>>>>> doesn't have any machine-specific logic here, and it's hard to tell
>>>>>>>>>> what other machine models could be affected. With ACPI stuff, we
>>>>>>>>>> generally just need to do the same thing Windows does for things to
>>>>>>>>>> work reliably, and DMI checks are more of a hack workaround than a
>>>>>>>>>> real fix.
>>>>>>>>>>
>>>>>>>>>> I'll try and have a look at the DSDT within the next few days and see
>>>>>>>>>> if I can figure anything out, unless someone beats me to it.
>>>>>>>>>
>>>>>>>>> I haven't gone into too much detail, but one thing I noticed with the
>>>>>>>>> DSDT is that there appear to be some _OSI checks for Windows 2006
>>>>>>>>> (i.e. Vista) that seem to affect various things, including potentially
>>>>>>>>> the PCI IRQ routing table. It's possible that their IRQ routing table
>>>>>>>>> is broken for legacy mode with an ACPI OS supporting Vista (as current
>>>>>>>>> Linux versions do). Could be this slipped through testing if they only
>>>>>>>>> tested AHCI mode with Vista installed.
>>>>>>>>>
>>>>>>>>> You can try booting with the kernel parameters
>>>>>>>>>
>>>>>>>>> acpi_osi=! acpi_osi="Windows 2001 SP3"
>>>>>>>>>
>>>>>>>>> That should make the BIOS think we are Windows XP and bypass the Vista
>>>>>>>>> code path. If that works, then you might want to check for a BIOS
>>>>>>>>> update on this machine.
>>>>>>>>>
>>>>>>>>
>>>>>>>> First of all, sorry for the late reply. I was kinda busy.
>>>>>>>>
>>>>>>>> I tried what you suggested but unfortunately the problem persists.
>>>>>>>> This makes me believe that Windows XP does have somekind of DMI check here.
>>>>>>>> Of course, while a BIOS update may solve this, I would prefer that Linux
>>>>>>>> should also be able to boot up with this broken BIOS as well.
>>>>>>>>
>>>>>>>> If you are certain that WinXP doesn't use DMI checks,
>>>>>>>> it could be that WinXP's driver of ICH7M's SATA controller applies
>>>>>>>> a quirk and sets that irq line to #20.
>>>>>>>
>>>>>>> Can you post the dmesg output from a bootup attempt with those options?
>>>>>>>
>>>>>>> You may also want to try adding just: acpi_osi=!
>>>>>>>
>>>>>>
>>>>>> None of the 3 possible combinations succeeded to boot.
>>>>>>
>>>>>> Here are a couple of dmesgs:
>>>>>>
>>>>>> Params: acpi_osi="Windows 2001 SP3"
>>>>>> http://pastebin.com/vF3BSuhc
>>>>>>
>>>>>> Params: acpi_osi=! acpi_osi="Windows 2001 SP3"
>>>>>> http://pastebin.com/BuUzc3es
>>>>>>
>>>>>> Params: acpi_osi=!
>>>>>> http://pastebin.com/u7uRx8Ru
>>>>>
>>>>> I'm not sure the option is actually taking effect properly. There
>>>>> should be a message "Disabled all _OSI OS vendors" that shows up in
>>>>> dmesg with the ! option. Can you try:
>>>>>
>>>>> acpi_osi="!" acpi_osi="Windows 2001 SP3"
>>>>>
>>>>> (with the quotes around the ! character).
>>>>>
>>>>
>>>> The following command line worked:
>>>> acpi_osi= acpi_osi="Windows 2001 SP3"
>>>>
>>>> So, it seems that the BIOS is broken. Is there any way to fix this,
>>>> without resorting to the hackish DMI checks?
>>>
>>> Probably not really. Have you checked for a newer BIOS version on this machine?
>>>
>>> If not, this is likely similar to a number of other systems listed in
>>> acpi_osi_dmi_table in drivers/acpi/blacklist.c which need to disable
>>> reporting Vista support.
>>>
>>
>>
>> Yup, the attached patch fixed it.
>> I will post it a little bit later, mind if I add your signed-off-by line? :)
>>
>> I would do a BIOS update and see if it was fixed there, but it seems that Toshiba's
>> BIOS updater and the BIOS itself causes more trouble than the problems fixed.
> 
> Sorry for the delay. Seems OK to me. When you submit the patch you 
> should include a link to this thread to the commit message, so someone 
> in the future would have a hope of knowing why this quirk is in here.

Yes, a comment explainning why this blacklist is needed and if that
whole system _OSI change has any other negative effect on this system,
e.g. does the hotkey for backlight/bluetooth/suspend/etc. still work?

Thanks,
Aaron

> 
> You can add my:
> 
> Reviewed-by: Robert Hancock <hancockrwd@xxxxxxxxx>
> 
>> ---
>> diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
>> index cb96296..34d4d1a 100644
>> --- a/drivers/acpi/blacklist.c
>> +++ b/drivers/acpi/blacklist.c
>> @@ -267,6 +267,14 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = {
>>  		     DMI_MATCH(DMI_PRODUCT_NAME, "Satellite P305D"),
>>  		},
>>  	},
>> +	{
>> +	.callback = dmi_disable_osi_vista,
>> +	.ident = "Toshiba NB100",
>> +	.matches = {
>> +		     DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
>> +		     DMI_MATCH(DMI_PRODUCT_NAME, "NB100"),
>> +		},
>> +	},
>>
>>  	/*
>>  	 * BIOS invocation of _OSI(Linux) is almost always a BIOS bug.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html