Re: [PATCH] BIOS SATA legacy mode failure

Levente Kurusa <levex@xxxxxxxxx> · Tue, 22 Oct 2013 16:32:07 +0200

2013-10-22 04:12 keltezéssel, Aaron Lu írta:
> On 10/22/2013 09:34 AM, Robert Hancock wrote:
>> On 10/16/2013 08:42 AM, Levente Kurusa wrote:
>>> 2013-10-16 02:16 keltezéssel, Robert Hancock írta:
>>>> On Sun, Oct 13, 2013 at 6:02 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>> 2013-10-13 07:57 keltezéssel, Robert Hancock írta:
>>>>>> On Sat, Oct 12, 2013 at 3:29 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>> 2013-10-12 04:06 keltezéssel, Robert Hancock írta:
>>>>>>>> On Fri, Oct 11, 2013 at 10:07 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>> 2013-10-01 06:25 keltezéssel, Robert Hancock írta:
>>>>>>>>>> On Sat, Sep 28, 2013 at 7:21 PM, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
>>>>>>>>>>> On Sat, Sep 28, 2013 at 11:46 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>>>>> 2013-09-28 06:55 keltezéssel, Robert Hancock írta:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Sep 27, 2013 at 7:24 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2013-09-25 08:31 keltezéssel, Robert Hancock írta:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Sep 22, 2013 at 1:13 AM, Levente Kurusa <levex@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2013-09-21 19:04 keltezéssel, Robert Hancock írta:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Sep 21, 2013 at 1:35 AM, Levente Kurusa <levex@xxxxxxxxx>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> The following dmesg is stuck in an infinite loop.
>>>>>>>>>>>>>>>>>>>>>>>> dmesg:
>>>>>>>>>>>>>>>>>>>>>>>> ata3: lost interrupt (Status 0x50)
>>>>>>>>>>>>>>>>>>>>>>>> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
>>>>>>>>>>>>>>>>>>>>>>>> frozen
>>>>>>>>>>>>>>>>>>>>>>>> ata3.00: failed command: READ DMA
>>>>>>>>>>>>>>>>>>>>>>>> ata3.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096
>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>                     res 40/00:00:00:00:00/00:00:00:00:00/00
>>>>>>>>>>>>>>>>>>>>>>>> Emask
>>>>>>>>>>>>>>>>>>>>>>>> 0x4
>>>>>>>>>>>>>>>>>>>>>>>> (timeout)
>>>>>>>>>>>>>>>>>>>>>>>> ata3.00: status: { DRDY }
>>>>>>>>>>>>>>>>>>>>>>>> ata3: soft resetting link
>>>>>>>>>>>>>>>>>>>>>>>> ata3.00: configured for UDMA/33 (no error)
>>>>>>>>>>>>>>>>>>>>>>>> ata3.00: device reported invalid CHS sector 0
>>>>>>>>>>>>>>>>>>>>>>>> ata3: EH complete
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Patch that fixes the infinite loop:
>>>>>>>>>>>>>>>>>>>>>>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>>>>>>>>>>>>>>>>>>>>>>> index f9476fb..eeedf80 100644
>>>>>>>>>>>>>>>>>>>>>>>> --- a/drivers/ata/libata-eh.c
>>>>>>>>>>>>>>>>>>>>>>>> +++ b/drivers/ata/libata-eh.c
>>>>>>>>>>>>>>>>>>>>>>>> @@ -2437,6 +2437,14 @@ static void ata_eh_link_report(struct
>>>>>>>>>>>>>>>>>>>>>>>> ata_link
>>>>>>>>>>>>>>>>>>>>>>>> *link)
>>>>>>>>>>>>>>>>>>>>>>>>                                   ehc->i.action, frozen,
>>>>>>>>>>>>>>>>>>>>>>>> tries_buf);
>>>>>>>>>>>>>>>>>>>>>>>>                       if (desc)
>>>>>>>>>>>>>>>>>>>>>>>>                               ata_dev_err(ehc->i.dev, "%s\n",
>>>>>>>>>>>>>>>>>>>>>>>> desc);
>>>>>>>>>>>>>>>>>>>>>>>> +               ehc->i.dev->exce_cnt ++;
>>>>>>>>>>>>>>>>>>>>>>>> +               ata_dev_warn(ehc->i.dev, "Number of exceptions:
>>>>>>>>>>>>>>>>>>>>>>>> %d\n",
>>>>>>>>>>>>>>>>>>>>>>>> ehc->i.dev->exce_cnt);
>>>>>>>>>>>>>>>>>>>>>>>> +               /**
>>>>>>>>>>>>>>>>>>>>>>>> +                  * The device is failing terribly,
>>>>>>>>>>>>>>>>>>>>>>>> +                 * disable it to prevent damage.
>>>>>>>>>>>>>>>>>>>>>>>> +                 */
>>>>>>>>>>>>>>>>>>>>>>>> +               if(ehc->i.dev->exce_cnt > 2)
>>>>>>>>>>>>>>>>>>>>>>>> +                       ata_dev_disable(ehc->i.dev);
>>>>>>>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>>>>>>>>                       ata_link_err(link, "exception Emask 0x%x
>>>>>>>>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>>>>>>>>>                                    "SAct 0x%x SErr 0x%x action
>>>>>>>>>>>>>>>>>>>>>>>> 0x%x%s%s\n",
>>>>>>>>>>>>>>>>>>>>>>>> diff --git a/include/linux/libata.h b/include/linux/libata.h
>>>>>>>>>>>>>>>>>>>>>>>> index eae7a05..fa52ee6 100644
>>>>>>>>>>>>>>>>>>>>>>>> --- a/include/linux/libata.h
>>>>>>>>>>>>>>>>>>>>>>>> +++ b/include/linux/libata.h
>>>>>>>>>>>>>>>>>>>>>>>> @@ -660,7 +660,8 @@ struct ata_device {
>>>>>>>>>>>>>>>>>>>>>>>>               u8
>>>>>>>>>>>>>>>>>>>>>>>> devslp_timing[ATA_LOG_DEVSLP_SIZE];
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>               /* error history */
>>>>>>>>>>>>>>>>>>>>>>>> -       int                     spdn_cnt;
>>>>>>>>>>>>>>>>>>>>>>>> +       int                     spdn_cnt; /* Number of
>>>>>>>>>>>>>>>>>>>>>>>> speed_downs
>>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>> +       int                     exce_cnt; /* Number of
>>>>>>>>>>>>>>>>>>>>>>>> exceptions
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>> happenned */
>>>>>>>>>>>>>>>>>>>>>>>>               /* ering is CLEAR_END, read comment above
>>>>>>>>>>>>>>>>>>>>>>>> CLEAR_END
>>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>>               struct ata_ering        ering;
>>>>>>>>>>>>>>>>>>>>>>>>        };
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This doesn't seem like a very good fix. It may prevent the
>>>>>>>>>>>>>>>>>>>>>>> apparent
>>>>>>>>>>>>>>>>>>>>>>> infinite loop but will just prevent that device from functioning
>>>>>>>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>>>>>> It would be better if we could figure out what was actually
>>>>>>>>>>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>>>>>>>> wrong.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I have tested the problem with three different computers, all
>>>>>>>>>>>>>>>>>>>>>> switched
>>>>>>>>>>>>>>>>>>>>>> to legacy/IDE/compatibility mode, and they didn't have this
>>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>>> Of
>>>>>>>>>>>>>>>>>>>>>> course, they could have been set to AHCI mode, and there the
>>>>>>>>>>>>>>>>>>>>>> kernel
>>>>>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>> boot normally. Feels strange, but so far I was only able to
>>>>>>>>>>>>>>>>>>>>>> reproduce
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> problem with a Toshiba MK8052GSX. On the topic of my patch, I
>>>>>>>>>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>>>>>>>> see why a device which fails so terribly that it reports 3
>>>>>>>>>>>>>>>>>>>>>> exceptions
>>>>>>>>>>>>>>>>>>>>>> shouldn't be disabled. Like in this case, it could cause infinite
>>>>>>>>>>>>>>>>>>>>>> loops.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The problem is that this could happen in some cases when you
>>>>>>>>>>>>>>>>>>>>> wouldn't
>>>>>>>>>>>>>>>>>>>>> want to disable the device, like an error that just happens
>>>>>>>>>>>>>>>>>>>>> sporadically and works on retry, or a device you're trying to
>>>>>>>>>>>>>>>>>>>>> recover
>>>>>>>>>>>>>>>>>>>>> data from.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> What do you think if I edit the patch in a way, that when an
>>>>>>>>>>>>>>>>>>>> operation
>>>>>>>>>>>>>>>>>>>> successfully completes, it resets exce_cnt to zero. Might as well
>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> module_param, which can set the maximum value of exce_cnt, while
>>>>>>>>>>>>>>>>>>>> having
>>>>>>>>>>>>>>>>>>>> zero
>>>>>>>>>>>>>>>>>>>> as an option to never disable the device. Please don't think me
>>>>>>>>>>>>>>>>>>>> wrong,
>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>> don't want to force this patch, I just want to learn how all this
>>>>>>>>>>>>>>>>>>>> works,
>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> in the process try to make it better. :-)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> That would be better, but I think you're still going to have an
>>>>>>>>>>>>>>>>>>> issue
>>>>>>>>>>>>>>>>>>> with what magic number to pick to avoid disabling devices
>>>>>>>>>>>>>>>>>>> inappropriately.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Conceptually, disabling the device doesn't really make sense anyway.
>>>>>>>>>>>>>>>>>>> If someone in userspace wants to keep trying to read from that
>>>>>>>>>>>>>>>>>>> device,
>>>>>>>>>>>>>>>>>>> why would you stop them because of some arbitrary judgement? The
>>>>>>>>>>>>>>>>>>> kernel itself isn't "locked up" during this process, anything not
>>>>>>>>>>>>>>>>>>> blocked on I/O to that device should be able to continue running, so
>>>>>>>>>>>>>>>>>>> that process is only hurting itself. If the system fails to boot
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> another device due to this, this would likely point out some kind of
>>>>>>>>>>>>>>>>>>> problem in userspace or the distro boot process being overly
>>>>>>>>>>>>>>>>>>> serialized.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have been booting up with the initramfs from ubuntu 13.04,
>>>>>>>>>>>>>>>>>> and I have also tried to boot with the ubuntu install cd. They
>>>>>>>>>>>>>>>>>> couldn't
>>>>>>>>>>>>>>>>>> continue the boot process. I'm gonna spend the weekend trying to
>>>>>>>>>>>>>>>>>> figure
>>>>>>>>>>>>>>>>>> out where and why the interrupts don't happen. Whether it be a
>>>>>>>>>>>>>>>>>> routing
>>>>>>>>>>>>>>>>>> or a hardware issue, which I highly doubt due to the fact that
>>>>>>>>>>>>>>>>>> Windows
>>>>>>>>>>>>>>>>>> XP SP2 was able to boot up without errors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Are you able to get out full dmesg output from a boot attempt and the
>>>>>>>>>>>>>>>>> contents of /proc/interrupts?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As I said before, I am not able to get to the shell, without my
>>>>>>>>>>>>>>>> 'symptom
>>>>>>>>>>>>>>>> cure'. With my patch I get the following dmesg output, with
>>>>>>>>>>>>>>>> some of my debug messages turned off:
>>>>>>>>>>>>>>>> http://pastebin.com/5eb5G3Dx
>>>>>>>>>>>>>>>> /proc/interrupts is here:
>>>>>>>>>>>>>>>> http://pastebin.com/84CJey2D
>>>>>>>>>>>>>>>> After yesterday's research, I have come to ata_piix.c . That file looks
>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>> the real culprit, as my netbook's controller is an Intel ICH7M one,
>>>>>>>>>>>>>>>> The values I am getting from the device are very different than those
>>>>>>>>>>>>>>>> that are expected.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Things I have noticed, but ignored in dmesg:
>>>>>>>>>>>>>>>> There is a stack dump, because nobody cared about IRQ#20. I have
>>>>>>>>>>>>>>>> ignored
>>>>>>>>>>>>>>>> this because it is the EHCI IRQ, and I suppose it has nothing to do
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> ata. The problem is with ata3 or /dev/sdc, while the IRQ happens
>>>>>>>>>>>>>>>> with /dev/sda, which works fine.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think it is likely related to the problem. The kernel thinks this
>>>>>>>>>>>>>>> controller is on IRQ 16, but apparently something is raising
>>>>>>>>>>>>>>> un-acknowledged interrupts on IRQ 20 and nothing is coming in on IRQ
>>>>>>>>>>>>>>> 16. It seems quite likely that this is actually the ATA controller.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You mentioned that Windows XP was able to work in this mode. I wonder
>>>>>>>>>>>>>>> if it was using the IOAPIC, as if not then the IRQ routing is
>>>>>>>>>>>>>>> different which might mask the problem. Do you know what IRQ Device
>>>>>>>>>>>>>>> Manager reported for this controller in Windows? And was it using any
>>>>>>>>>>>>>>> IRQs over 15 (which would indicate the IOAPIC was in use)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hmm, according to WinXP's Device manager for this controller,
>>>>>>>>>>>>>> it listens to IRQ# 20, and therefore it is using the I/O APIC.
>>>>>>>>>>>>>> Now, one question remains where is the error that mismaps
>>>>>>>>>>>>>> controller?
>>>>>>>>>>>>>> I have created a simple patch which seems to fix this:
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> @@ -1704,6 +1767,8 @@ static int piix_init_one(struct pci_dev *pdev,
>>>>>>>>>>>>>> const
>>>>>>>>>>>>>> struct pci_device_id *ent)
>>>>>>>>>>>>>>                  hpriv->map = piix_init_sata_map(pdev, port_info,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> piix_map_db_table[ent->driver_data]);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +       if(pdev->vendor == 0x8086 && pdev->device == 0x27C4)
>>>>>>>>>>>>>> +               pdev->irq = 20;
>>>>>>>>>>>>>>          rc = ata_pci_bmdma_prepare_host(pdev, ppi, &host);
>>>>>>>>>>>>>>          if (rc)
>>>>>>>>>>>>>>                  return rc;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, I am more than sure that this is not the way
>>>>>>>>>>>>>> to solve this problem. Do you have any idea on where
>>>>>>>>>>>>>> the ideal place would be to implement a fix?
>>>>>>>>>>>>>> According to specs of ICH7M, which is essentially the
>>>>>>>>>>>>>> same as ICH6M, we need to check on what interrupt pin
>>>>>>>>>>>>>> is the SATA controller, and after that check which IRQ line
>>>>>>>>>>>>>> is connected to the I/O APIC and decide the IRQ's number
>>>>>>>>>>>>>> on those findings.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Specs of ICH7:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://www.intel.com/content/dam/doc/datasheet/i-o-controller-hub-7-datasheet.pdf
>>>>>>>>>>>>>> Device 31 Interrupt Route Register: Chapter 7.1.46
>>>>>>>>>>>>>> Device 31 Interrupt Pin Register: Chapter 7.1.41
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The SATA controller is always Device 31.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It would appear that something is messing up with the ACPI IRQ routing
>>>>>>>>>>>>> on this machine that's causing us to think the controller is on the
>>>>>>>>>>>>> wrong IRQ. CCing the linux-acpi list to see if anyone has some
>>>>>>>>>>>>> additional debugging suggestions. I suspect that dumping the DSDT is
>>>>>>>>>>>>> likely the first step though. If you can get IASL installed, you can
>>>>>>>>>>>>> do something like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> cat /sys/firmware/acpi/tables/DSDT > dsdt.aml
>>>>>>>>>>>>> iasl -d dsdt.aml
>>>>>>>>>>>>>
>>>>>>>>>>>>> That should spit out a dsdt.dsl file which would hopefully have the
>>>>>>>>>>>>> info needed to figure out what's going on.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the disassembled DSDT table:
>>>>>>>>>>>> http://pastebin.com/LWNVht9H
>>>>>>>>>>>> The SATA controller is at line 5206.
>>>>>>>>>>>> I also disassembled the SSDT, but nothing interesting was there:
>>>>>>>>>>>> http://pastebin.com/fus5sxU8
>>>>>>>>>>>>
>>>>>>>>>>>> I disabled the usage of ACPI for IRQs with acpi=noirq,
>>>>>>>>>>>> and it successfully booted up setting itself to IRQ#3.
>>>>>>>>>>>> This makes me think that this is the BIOS's fault.
>>>>>>>>>>>> I think it would be possible to create a DMI check
>>>>>>>>>>>> and forcibly set the irq to 20 if the DMI matches.
>>>>>>>>>>>> Any comments on this?
>>>>>>>>>>>
>>>>>>>>>>> The BIOS may be doing something funky, but since Windows apparently
>>>>>>>>>>> can figure out it's on IRQ 20, Linux presumably should be able to as
>>>>>>>>>>> well. DMI checks should be the last resort - Windows almost certainly
>>>>>>>>>>> doesn't have any machine-specific logic here, and it's hard to tell
>>>>>>>>>>> what other machine models could be affected. With ACPI stuff, we
>>>>>>>>>>> generally just need to do the same thing Windows does for things to
>>>>>>>>>>> work reliably, and DMI checks are more of a hack workaround than a
>>>>>>>>>>> real fix.
>>>>>>>>>>>
>>>>>>>>>>> I'll try and have a look at the DSDT within the next few days and see
>>>>>>>>>>> if I can figure anything out, unless someone beats me to it.
>>>>>>>>>>
>>>>>>>>>> I haven't gone into too much detail, but one thing I noticed with the
>>>>>>>>>> DSDT is that there appear to be some _OSI checks for Windows 2006
>>>>>>>>>> (i.e. Vista) that seem to affect various things, including potentially
>>>>>>>>>> the PCI IRQ routing table. It's possible that their IRQ routing table
>>>>>>>>>> is broken for legacy mode with an ACPI OS supporting Vista (as current
>>>>>>>>>> Linux versions do). Could be this slipped through testing if they only
>>>>>>>>>> tested AHCI mode with Vista installed.
>>>>>>>>>>
>>>>>>>>>> You can try booting with the kernel parameters
>>>>>>>>>>
>>>>>>>>>> acpi_osi=! acpi_osi="Windows 2001 SP3"
>>>>>>>>>>
>>>>>>>>>> That should make the BIOS think we are Windows XP and bypass the Vista
>>>>>>>>>> code path. If that works, then you might want to check for a BIOS
>>>>>>>>>> update on this machine.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First of all, sorry for the late reply. I was kinda busy.
>>>>>>>>>
>>>>>>>>> I tried what you suggested but unfortunately the problem persists.
>>>>>>>>> This makes me believe that Windows XP does have somekind of DMI check here.
>>>>>>>>> Of course, while a BIOS update may solve this, I would prefer that Linux
>>>>>>>>> should also be able to boot up with this broken BIOS as well.
>>>>>>>>>
>>>>>>>>> If you are certain that WinXP doesn't use DMI checks,
>>>>>>>>> it could be that WinXP's driver of ICH7M's SATA controller applies
>>>>>>>>> a quirk and sets that irq line to #20.
>>>>>>>>
>>>>>>>> Can you post the dmesg output from a bootup attempt with those options?
>>>>>>>>
>>>>>>>> You may also want to try adding just: acpi_osi=!
>>>>>>>>
>>>>>>>
>>>>>>> None of the 3 possible combinations succeeded to boot.
>>>>>>>
>>>>>>> Here are a couple of dmesgs:
>>>>>>>
>>>>>>> Params: acpi_osi="Windows 2001 SP3"
>>>>>>> http://pastebin.com/vF3BSuhc
>>>>>>>
>>>>>>> Params: acpi_osi=! acpi_osi="Windows 2001 SP3"
>>>>>>> http://pastebin.com/BuUzc3es
>>>>>>>
>>>>>>> Params: acpi_osi=!
>>>>>>> http://pastebin.com/u7uRx8Ru
>>>>>>
>>>>>> I'm not sure the option is actually taking effect properly. There
>>>>>> should be a message "Disabled all _OSI OS vendors" that shows up in
>>>>>> dmesg with the ! option. Can you try:
>>>>>>
>>>>>> acpi_osi="!" acpi_osi="Windows 2001 SP3"
>>>>>>
>>>>>> (with the quotes around the ! character).
>>>>>>
>>>>>
>>>>> The following command line worked:
>>>>> acpi_osi= acpi_osi="Windows 2001 SP3"
>>>>>
>>>>> So, it seems that the BIOS is broken. Is there any way to fix this,
>>>>> without resorting to the hackish DMI checks?
>>>>
>>>> Probably not really. Have you checked for a newer BIOS version on this machine?
>>>>
>>>> If not, this is likely similar to a number of other systems listed in
>>>> acpi_osi_dmi_table in drivers/acpi/blacklist.c which need to disable
>>>> reporting Vista support.
>>>>
>>>
>>>
>>> Yup, the attached patch fixed it.
>>> I will post it a little bit later, mind if I add your signed-off-by line? :)
>>>
>>> I would do a BIOS update and see if it was fixed there, but it seems that Toshiba's
>>> BIOS updater and the BIOS itself causes more trouble than the problems fixed.
>>
>> Sorry for the delay. Seems OK to me. When you submit the patch you 
>> should include a link to this thread to the commit message, so someone 
>> in the future would have a hope of knowing why this quirk is in here.
> 
> Yes, a comment explainning why this blacklist is needed and if that
> whole system _OSI change has any other negative effect on this system,
> e.g. does the hotkey for backlight/bluetooth/suspend/etc. still work?
> 

Yes, everything is in the same state as it was pre-patch, but now IDE mode
also works.

>> You can add my:
>>
>> Reviewed-by: Robert Hancock <hancockrwd@xxxxxxxxx>

Thank you, will add.

-- 
Regards,
Levente Kurusa
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html