Re: Faulty seagate drives, are going to be blacklisted?

Patrick Horn <patrick.horn@xxxxxxxxx> · Wed, 21 Jan 2009 02:27:38 -0800

Diego Calleja wrote:
Tech sites are reporting everywhere a massive flaw in seagate drives that
can lock up the drive and make it unusable (the bios doesn't detect it, you
can't read the data). Haven't read anything about it here on the lists.
Seagate has ack'ed the problem:
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931

So, apparently there're a lot of drives on the market (including mine)
that can die any day. Are those drives going to be blacklisted? It's
still not clear if the firmware update is safe (some affected but
working drives are dying after the firmware update), so some people
like me is still waiting (and hoping that the drive doesn't die) for
more stable firmware updates...

Here is the list of drives+firmware affected, according to the support site
as of now. Some models are still being diagnosed.

Seagate Barracuda 7200.11 (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207951)

Models Affected:
 ST3500320AS
 ST3640330AS
 ST3750330AS
 ST31000340AS
Firmware Affected
 SD15, SD16, SD17, SD18, SD19, AD14
Recommended Firmware Update
 SD1A

Seagate Barracuda 7200.11, page 2 (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207957)
Models Affected:
 ST31500341AS
 ST31000333AS
 ST3640323AS
 ST3640623AS
 ST3320613AS
 ST3320813AS
 ST3160813AS
Firmware Affected
 Still Unknow
Recommended Firmware Update
 Still Unknow

Seagate Barracuda ES.2 (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207963)
Models Affected:
 ST3250310NS
 ST3500320NS
 ST3750330NS
 ST31000340NS
Firmware Affected
 Still Unknow
Recommended Firmware Update
 Still Unknow

DiamondMax 22 (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207969)
Models Affected:
 STM3500320AS
 STM3750330AS
 STM31000340AS
Firmware Affected
 MX15 (or higher)
Recommended Firmware Update
 MX1A

DiamondMax 22 (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207975)
Models Affected:
 STM31000334AS
 STM3320614AS
 STM3160813AS
Firmware Affected
 Still Unknow
Recommended Firmware Update
 Still Unknow
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Hi,

I have another drive which doesn't seem to be on any list, and a google search 
comes up with very little information about this one.

I have two raided SATA 1TB "MAXTOR STM31000333AS" drives, firmware MX15, one of 
which "failed" last weekend.  I have since rebuilt the array and it has had no 
further problems, but I know it's only a matter of time before it happens again.

I checked SMART, and both drives are essentially identical with nothing anywhere 
near failure.
I am on Ubuntu kernel 2.6.28-4-generic #5-Ubuntu but I will be happy to build a 
kernel if this becomes at all reproducible.

At first I thought that this NCQ problem might apply to me, but my drive is 
(gasp) one letter different from two of those listed (both seagate and maxtor 
variants):
http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931
And MX15 is listed as a faulty firmware for the STM31000340AS/334AS

I have been using these drives for just three weeks up to now, before having the 
one drive fail (and later it gave a bunch of errors at bootup, which was solved 
when it reset the SATA link). The other drive has luckily not had any issues.

Is this error just coincidence, or did Seagate forget to mention my drive?
(And what happened to the firmware updates--they seem to be "In Validation")
Is seagate the only site with information about this? Any public blacklist of 
every affected drive? What can I see in dmesg that indicates that NCQ is the cause?

Thanks,
-Patrick

(I'll paste my dmesg as I don't know enough to tell if this is the same issue as 
the other seagate drives--I trimmed the repetitive parts)

[ 7520.699730] ata2.00: exception Emask 0x10 SAct 0x7ff4f SErr 0x400100 action 
0x6 frozen
[ 7520.699734] ata2.00: irq_stat 0x08000000, interface fatal error
[ 7520.699738] ata2: SError: { UnrecovData Handshk }
[ 7520.699743] ata2.00: cmd 61/50:00:89:4b:c0/00:00:01:00:00/40 tag 0 ncq 40960 out
[ 7520.699745]          res 40/00:30:91:60:c0/00:00:01:00:00/40 Emask 0x10 (ATA 
bus error)
[ 7520.699748] ata2.00: status: { DRDY }
[ 7520.699752] ata2.00: cmd 61/40:08:b1:4f:c0/00:00:01:00:00/40 tag 1 ncq 32768 out
[ 7520.699753]          res 40/00:30:91:60:c0/00:00:01:00:00/40 Emask 0x10 (ATA 
bus error)
[ 7520.699756] ata2.00: status: { DRDY }
[ 7520.699875] ata2: hard resetting link
[ 7521.180020] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 7521.250673] ata2.00: configured for UDMA/133
[ 7521.250724] ata2: EH complete
[ 7521.250812] sd 1:0:0:0: [sdb] 1953525168 512-byte hardware sectors: (1.00 
TB/931 GiB)
[ 7521.250832] sd 1:0:0:0: [sdb] Write Protect is off
[ 7521.250835] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 7521.250865] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[ 7521.258968] ata2.00: exception Emask 0x10 SAct 0x7ffff SErr 0x400100 action 
0x6 frozen
[ 7521.258972] ata2.00: irq_stat 0x08000000, interface fatal error
[ 7521.258975] ata2: SError: { UnrecovData Handshk }
... it then goes down to 1.5 Gbps but continues to give errors until it is 
kicked from the raid array an hour later

[10477.764175] ata2.00: status: { DRDY }
[10477.764179] ata2: hard resetting link
[10478.248019] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[10478.318670] ata2.00: configured for UDMA/33
[10478.318679] end_request: I/O error, dev sdb, sector 989067690
[10478.318685] raid1: Disk failure on sdb3, disabling device.
[10478.318686] raid1: Operation continuing on 1 devices.

This drive also encountered a similar error on bootup the next day:
[    9.389771] ata2.00: exception Emask 0x10 SAct 0xf SErr 0xc00000 action 0x6 
frozen
[    9.389774] ata2.00: irq_stat 0x0c000000, interface fatal error
[    9.389776] ata2: SError: { Handshk LinkSeq }
[    9.389780] ata2.00: cmd 60/02:00:3f:af:4e/00:00:00:00:00/40 tag 0 ncq 1024 in
[    9.389781]          res 40/00:10:41:af:4e/00:00:00:00:00/40 Emask 0x10 (ATA 
bus error)
[    9.389783] ata2.00: status: { DRDY }

From lspci -vvv:

0:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port 
SATA AHCI Controller (rev 02) (prog-if 01)
        Subsystem: ASUSTeK Computer Inc. Device 8277 

        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0 

        Interrupt: pin B routed to IRQ 2299 

        Region 0: I/O ports at 9c00 [size=8] 

        Region 1: I/O ports at 9880 [size=4] 

        Region 2: I/O ports at 9800 [size=8] 

        Region 3: I/O ports at 9480 [size=4] 

        Region 4: I/O ports at 9400 [size=32] 

        Region 5: Memory at f9ffe800 (32-bit, non-prefetchable) [size=2K] 

        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/4 
Enable+
                Address: fee0f00c  Data: 4181 

        Capabilities: [70] Power Management version 3 

                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME- 

        Capabilities: [a8] SATA HBA <?> 

        Capabilities: [b0] Vendor Specific Information <?> 

        Kernel driver in use: ahci 

        Kernel modules: ahci 

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html