Re: [2.6.18,19] SATA boot problems (ICH6/ICH6W)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 20, 2006 at 09:44:45AM +0900, Tejun Heo wrote:
> Kovid Goyal wrote:
> > Hi,
> >
> > I'm having problems booting from a SATA disk with 2.6.19.
> > Grub loads fine, but when the kernel boots, it *sometimes* ends up with
> > VFS: Cannot open root device "sda5" or unknown-block(0,0).
> >
> > This issue appears to be identical with the one reported in
> > http://lkml.org/lkml/2006/10/19/327 except that 2.6.19 did not fix it for me.
> > In fact, it made it worse. The timeouts reported in that thread are still
> > present and with 2.6.19 the fraction of unsuccessful boots has increased from
> > 30% to 80%. I often have to reboot my machine 3 times in a row before it
> > succeeds. I have tried both vanilla 2.6.19 and gentoo-sources-2.6.19-r1
> >
> > To summarize: My machine stalls for nearly a minute on all boots and the root
> > filesystem fails to mount on 80% of the boots. This problem first manifested
> > itself when I switched to 2.6.18 and has become worse with 2.6.19.
> >
> > The controller:
> > 00:1f.2 IDE interface: Intel Corporation 82801FB/FW (ICH6/ICH6W) SATA
> > Controller (rev 03)
> >
> > I don't know how to get the logs from an unsuccessful boot. Here is
> > an extract from the logs of a successful boot (in which the timeout is
> > present):
> 
> Does goving back to 2.6.17 fix the problem?  Also, please report the
> result of 'smartctl -d ata -a /dev/sdX' where sdX is the problematic drive.

We have been seeing what might be the same problem on an 
IBM Intellistation M30 Pro which contains the same SATA controller.

I have been debugging and trying to better understand the problem
before posting my findings and possible fix but after noticing this 
message today and then seeing the "SRST failed (status 0xFF)" messages
included in Kovid's original message I think I should share what I 
have found so far.

In our case the SRST failures are associated with a Quantum GoVault
removable hard drive device.  We have also noticed that failure is
dependent on cable placement e.g. after encoutering the failure a
reversal of the ports to which the hard drive and GoVault cables
are connected will cause the problem to disappear.

The above SRST failure message was from this code in libata-core.c.

        /* Before we perform post reset processing we want to see if
         * the bus shows 0xFF because the odd clown forgets the D7
         * pulldown resistor.
         */
        if (ata_check_status(ap) == 0xFF) {
                ata_port_printk(ap, KERN_ERR, "SRST failed (status 0xFF)\n");
                return AC_ERR_OTHER;
        }

I noticed that Tejun recently provided a "libata: handle 0xff status 
properly" patch that is now in mainline that improves this code
  re: http://marc.theaimsgroup.com/?l=linux-ide&m=116038642105802&w=2
but I found that the check still failed but more silently and with no 
retries.

I decided to try increasing the delay that preceeds the above 
check [ msleep(150); ] and found that a change from 150ms to 
1000ms caused the problem to disappear. 

I then replaced the msleep(150); with:
    {
        int i, ms = 5;
        msleep(ms);
        ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n",
                                        ms, ata_check_status(ap));
        for (i = 1; i <= 20; i++) {
            ms += 50;
            msleep(50);
            ata_port_printk(ap, KERN_INFO, "status @ %d ms: 0x%x\n",
                                            ms, ata_check_status(ap));
        }
    }

Output for two cable placement configurations (0xFF check failure 
and 0xFF check success) are included below.  Note that there are 
cable placement configurations for both the hard drive and 
GoVault where the initial status is 0xff. i.e. both transition 
from 0xff to 0x7f when BSY bit is cleared but it is taking MUCH 
longer for the GoVault (600-700ms for GoVault and <5ms for 
hard drive).  It does not appear that the 0xff starting status
is device specific.

So, it appears that we have a situation with this SATA controller 
where a 0xFF status is not an accurate indication that there is
no device.  

Although the 150ms to 1000ms delay increase works for the GoVault 
device I am not sure if it is the best long term fix for the problem.

Gary

-- 
Gary Hade
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@xxxxxxxxxx
http://www.ibm.com/linux/ltc

================================================
Falied 0xFF check cable placement configuration
================================================
libata version 2.00 loaded.
ata_piix 0000:00:1f.2: version 2.00
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 225
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x20C0 ctl 0x20BA bmdma 0x2090 irq 225
ata2: SATA max UDMA/133 cmd 0x20B0 ctl 0x20A6 bmdma 0x2098 irq 225
scsi0 : ata_piix
ata1: status @ 5 ms: 0xff
ata1: status @ 55 ms: 0xff
ata1: status @ 105 ms: 0xff
ata1: status @ 155 ms: 0xff
ata1: status @ 205 ms: 0xff
ata1: status @ 255 ms: 0xff
ata1: status @ 305 ms: 0xff
ata1: status @ 355 ms: 0xff
ata1: status @ 405 ms: 0xff
ata1: status @ 455 ms: 0xff
ata1: status @ 505 ms: 0xff
ata1: status @ 555 ms: 0xff
ata1: status @ 605 ms: 0xff
ata1: status @ 655 ms: 0x7f
ata1: status @ 705 ms: 0x7f
ata1: status @ 755 ms: 0x7f
ata1: status @ 805 ms: 0x7f
ata1: status @ 855 ms: 0x7f
ata1: status @ 905 ms: 0x7f
ata1: status @ 955 ms: 0x7f
ata1: status @ 1005 ms: 0x7f
ATA: abnormal status 0x7F on port 0x20C7
ATA: abnormal status 0x7F on port 0x20C7
ata1.01: ATAPI, max UDMA/66
ata1.01: configured for UDMA/66
scsi1 : ata_piix
ata2: status @ 5 ms: 0x50
ata2: status @ 55 ms: 0x50
ata2: status @ 105 ms: 0x50
ata2: status @ 155 ms: 0x50
ata2: status @ 205 ms: 0x50
ata2: status @ 255 ms: 0x50
ata2: status @ 305 ms: 0x50
ata2: status @ 355 ms: 0x50
ata2: status @ 405 ms: 0x50
ata2: status @ 455 ms: 0x50
ata2: status @ 505 ms: 0x50
ata2: status @ 555 ms: 0x50
ata2: status @ 605 ms: 0x50
ata2: status @ 655 ms: 0x50
ata2: status @ 705 ms: 0x50
ata2: status @ 755 ms: 0x50
ata2: status @ 805 ms: 0x50
ata2: status @ 855 ms: 0x50
ata2: status @ 905 ms: 0x50
ata2: status @ 955 ms: 0x50
ata2: status @ 1005 ms: 0x50
ata2.00: ATA-7, max UDMA/133, 156312576 sectors: LBA
ata2.00: ata2: dev 0 multi count 16
ata2.00: configured for UDMA/133
  Vendor: IBM       Model: GoVault           Rev: 008F
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 156300464 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 74 00 00
sda: cache data unavailable
sda: assuming drive cache: write through
SCSI device sda: 156300464 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 74 00 00
sda: cache data unavailable
sda: assuming drive cache: write through
 sda: sda1 sda2 sda3
sd 0:0:1:0: Attached scsi removable disk sda
  Vendor: ATA       Model: Maxtor 6Y080M0    Rev: YAR5
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 156312576 512-byte hdwr sectors (80032 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
SCSI device sdb: 156312576 512-byte hdwr sectors (80032 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 >
sd 1:0:0:0: Attached scsi disk sdb
==========================================================

===================================================
Successful 0xFF check cable placement configuration
===================================================
libata version 2.00 loaded.
ata_piix 0000:00:1f.2: version 2.00
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 19 (level, low) -> IRQ 225
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x20C0 ctl 0x20BA bmdma 0x2090 irq 225
ata2: SATA max UDMA/133 cmd 0x20B0 ctl 0x20A6 bmdma 0x2098 irq 225
scsi0 : ata_piix
ata1: status @ 5 ms: 0x7f
ata1: status @ 55 ms: 0x7f
ata1: status @ 105 ms: 0x7f
ata1: status @ 155 ms: 0x7f
ata1: status @ 205 ms: 0x7f
ata1: status @ 255 ms: 0x7f
ata1: status @ 305 ms: 0x7f
ata1: status @ 355 ms: 0x7f
ata1: status @ 405 ms: 0x7f
ata1: status @ 455 ms: 0x7f
ata1: status @ 505 ms: 0x7f
ata1: status @ 555 ms: 0x7f
ata1: status @ 605 ms: 0x7f
ata1: status @ 655 ms: 0x7f
ata1: status @ 705 ms: 0x7f
ata1: status @ 755 ms: 0x7f
ata1: status @ 805 ms: 0x7f
ata1: status @ 855 ms: 0x7f
ata1: status @ 905 ms: 0x7f
ata1: status @ 955 ms: 0x7f
ata1: status @ 1005 ms: 0x7f
ATA: abnormal status 0x7F on port 0x20C7
ATA: abnormal status 0x7F on port 0x20C7
ata1.01: ATA-7, max UDMA/133, 156312576 sectors: LBA
ata1.01: ata1: dev 1 multi count 16
ata1.01: configured for UDMA/133
scsi1 : ata_piix
ata2: status @ 5 ms: 0xd0
ata2: status @ 55 ms: 0xd0
ata2: status @ 105 ms: 0xd0
ata2: status @ 155 ms: 0xd0
ata2: status @ 205 ms: 0xd0
ata2: status @ 255 ms: 0xd0
ata2: status @ 305 ms: 0xd0
ata2: status @ 355 ms: 0xd0
ata2: status @ 405 ms: 0xd0
ata2: status @ 455 ms: 0xd0
ata2: status @ 505 ms: 0xd0
ata2: status @ 555 ms: 0xd0
ata2: status @ 605 ms: 0xd0
ata2: status @ 655 ms: 0x0
ata2: status @ 705 ms: 0x0
ata2: status @ 755 ms: 0x0
ata2: status @ 805 ms: 0x0
ata2: status @ 855 ms: 0x0
ata2: status @ 905 ms: 0x0
ata2: status @ 955 ms: 0x0
ata2: status @ 1005 ms: 0x0
ata2.00: ATAPI, max UDMA/66
ata2.00: configured for UDMA/66
  Vendor: ATA       Model: Maxtor 6Y080M0    Rev: YAR5
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 156312576 512-byte hdwr sectors (80032 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 >
sd 0:0:1:0: Attached scsi disk sda
  Vendor: IBM       Model: GoVault           Rev: 008F
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 74 00 00
sdb: cache data unavailable
sdb: assuming drive cache: write through
SCSI device sdb: 156300464 512-byte hdwr sectors (80026 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 74 00 00
sdb: cache data unavailable
sdb: assuming drive cache: write through
 sdb: sdb1 sdb2 sdb3
sd 1:0:0:0: Attached scsi removable disk sdb
==========================================================

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux