SATA disk dies and revives after boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Yesterday the below happened on my home xen server, dom0 is debian stable with ubuntu kernel 2.6.24-16-xen. Given that yesterday was sunday, there was not much going on (I guess that we were somewhere between church and home, so there really was not much going on). After this, the disk does not respond to anything and needs a reboot to return to sanity. After that it may work for some period of time (days or weeks). I recently ran a long smart test and that returned no errors. Also after a reboot the disk seems to be just fine (except I need to re-add to the RAID1 arrays). I've also had this disk connected to a promise controller. The same thing happened there.
Previously, using 2.6.18, it would do this as well.

May 18 13:06:15 quark kernel: [174871.044304] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen May 18 13:06:15 quark kernel: [174871.044353] ata5.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 May 18 13:06:15 quark kernel: [174871.044355] res 40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout)
May 18 13:06:15 quark kernel: [174871.044412] ata5.00: status: { DRDY }
May 18 13:06:20 quark kernel: [174876.082713] ata5: port is slow to respond, please be patient (Status 0xd0)
May 18 13:06:25 quark kernel: [174881.065279] ata5: soft resetting link
May 18 13:06:55 quark kernel: [174911.291301] ata5.00: qc timeout (cmd 0xec)
May 18 13:06:55 quark kernel: [174911.291337] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 18 13:06:55 quark kernel: [174911.291361] ata5.00: revalidation failed (errno=-5) May 18 13:06:55 quark kernel: [174911.291384] ata5: failed to recover some devices, retrying in 5 secs May 18 13:07:05 quark kernel: [174921.328542] ata5: port is slow to respond, please be patient (Status 0xd0)
May 18 13:07:10 quark kernel: [174926.312085] ata5: soft resetting link
May 18 13:07:40 quark kernel: [174956.537601] ata5.00: qc timeout (cmd 0xec)
May 18 13:07:40 quark kernel: [174956.537638] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 18 13:07:40 quark kernel: [174956.537662] ata5.00: revalidation failed (errno=-5) May 18 13:07:40 quark kernel: [174956.537685] ata5: failed to recover some devices, retrying in 5 secs May 18 13:07:50 quark kernel: [174966.580807] ata5: port is slow to respond, please be patient (Status 0xd0)
May 18 13:07:55 quark kernel: [174971.564289] ata5: soft resetting link
May 18 13:08:26 quark kernel: [175001.790832] ata5.00: qc timeout (cmd 0xec)
May 18 13:08:26 quark kernel: [175001.790867] ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 18 13:08:26 quark kernel: [175001.790891] ata5.00: revalidation failed (errno=-5)
May 18 13:08:26 quark kernel: [175001.790914] ata5.00: disabled
May 18 13:08:31 quark kernel: [175007.327614] ata5: port is slow to respond, please be patient (Status 0xd0)
May 18 13:08:36 quark kernel: [175012.311144] ata5: soft resetting link
May 18 13:08:36 quark kernel: [175012.478592] ata5: EH complete
May 18 13:08:36 quark kernel: [175012.478684] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK May 18 13:08:36 quark kernel: [175012.478726] end_request: I/O error, dev sdb, sector 62412332 May 18 13:08:36 quark kernel: [175012.478751] md: super_written gets error=-5, uptodate=0 May 18 13:08:36 quark kernel: [175012.478777] raid1: Disk failure on sdb5, disabling device.

The ata/disk info from dmesg:
[    4.716187] sata_via 0000:00:0f.0: version 2.3
[    4.716429] sata_via 0000:00:0f.0: routed to hard irq line 10
[    4.720203] scsi3 : sata_via
[    4.721459] scsi4 : sata_via
[ 4.721675] ata5: SATA max UDMA/133 cmd 0xd400 ctl 0xd000 bmdma 0xcc08 irq 20
[    5.135540] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    5.299951] ata5.00: ATA-7: Maxtor 6Y080M0, YAR511W0, max UDMA/133
[    5.300033] ata5.00: 160086528 sectors, multi 16: LBA
[    5.315957] ata5.00: configured for UDMA/133
[ 5.316356] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors (81964 MB)
[    5.316448] sd 4:0:0:0: [sdb] Write Protect is off
[    5.316526] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 5.316543] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 5.316684] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors (81964 MB)
[    5.316772] sd 4:0:0:0: [sdb] Write Protect is off
[    5.316850] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 5.316865] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    5.316960]  sdb: sdb2 < sdb5 sdb6 sdb7 sdb8 >
[    5.404184] sd 4:0:0:0: [sdb] Attached SCSI disk

It is this sata controller:
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
   Subsystem: Micro-Star International Co., Ltd. K8T Neo 2 Motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
   Latency: 128
   Interrupt: pin B routed to IRQ 20
   Region 0: I/O ports at dc00 [size=8]
   Region 1: I/O ports at d800 [size=4]
   Region 2: I/O ports at d400 [size=8]
   Region 3: I/O ports at d000 [size=4]
   Region 4: I/O ports at cc00 [size=16]
   Region 5: I/O ports at c800 [size=256]
   Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
       Status: D0 PME-Enable- DSel=0 DScale=0 PME-

This is the disk:
quark:~# hdparm -i /dev/sdb

/dev/sdb:

Model=Maxtor 6Y080M0 , FwRev=YAR511W0, SerialNo=Y236DHAC Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=160086528
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0: ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

* signifies the current active mode

Is there any other info that can help? Please ask.
I don't understand the error codes, so have no clue why or what fails.
I would welcome suggestions how to get this disk back online next time this happens. The other sata connection on this controller is unused, but the PATA at 0:0:f.1 is used, so if there's something I can do to the controller without disturbing the PATA... (I'm thinking power-down the disk and/or controller using the command line)

I'm not really keen on testing patches, because this is my home server and the rest of the family will not thank me for experimenting.

Thanks,
Jan Evert


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux