Hi, I recently constructed a RAID-5 using four 320Gb drives, moved the system and other data across and everything was working nicely. I awoke to find a degraded array email one day and on further investigation I noticed the following in /var/log/messages. The RAID-5 array was running in degraded mode and had evidently renoticed that /dev/sdm and sdm2 had reappeared and listed them again as a failed spare in the --detail of mdadm. I could not however see sdm with fdisk. Keeping the story short, do the /var/log/messages give any hints as to why the machine decided that /dev/sdm should have been inacccessible? The RAID-5 that had the issues is running off a SIL 3114 controller, thus no NCQ issues possible. # uname -a Linux x 2.6.23.15-80.fc7 #1 SMP Sun Feb 10 17:29:10 EST 2008 i686 athlon i386 GNU/Linux /var/log/messages: Jun 6 04:27:10 x kernel: ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jun 6 04:29:13 x kernel: ata13.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 Jun 6 04:29:13 x kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 6 04:29:13 x kernel: ata13: port is slow to respond, please be patient (Status 0xd0) Jun 6 04:29:13 x kernel: ata13: device not ready (errno=-16), forcing hardreset Jun 6 04:29:13 x kernel: ata13: hard resetting port Jun 6 04:29:13 x kernel: ata13: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 6 04:29:13 x kernel: ata13.00: qc timeout (cmd 0xec) Jun 6 04:29:13 x kernel: ata13.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 6 04:29:13 x kernel: ata13.00: revalidation failed (errno=-5) Jun 6 04:29:13 x kernel: ata13: failed to recover some devices, retrying in 5 secs Jun 6 04:29:13 x kernel: ata13: hard resetting port Jun 6 04:29:13 x kernel: ata13: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 6 04:29:13 x kernel: ata13.00: qc timeout (cmd 0x27) Jun 6 04:29:13 x kernel: ata13.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (625142448) Jun 6 04:29:13 x kernel: ata13.00: failed to set xfermode (err_mask=0x40) Jun 6 04:29:13 x kernel: ata13.00: limiting speed to UDMA/100:PIO3 Jun 6 04:29:13 x kernel: ata13: failed to recover some devices, retrying in 5 secs Jun 6 04:29:13 x kernel: ata13: hard resetting port Jun 6 04:29:13 x kernel: ata13: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 6 04:29:13 x kernel: ata13.00: qc timeout (cmd 0xec) Jun 6 04:29:13 x kernel: ata13.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jun 6 04:29:13 x kernel: ata13.00: revalidation failed (errno=-5) Jun 6 04:29:13 x kernel: ata13.00: disabled Jun 6 04:29:13 x kernel: ata13: EH pending after completion, repeating EH (cnt=4) Jun 6 04:29:13 x kernel: ata13: port is slow to respond, please be patient (Status 0xd0) Jun 6 04:29:13 x kernel: ata13: device not ready (errno=-16), forcing hardreset Jun 6 04:29:13 x kernel: ata13: hard resetting port Jun 6 04:29:13 x kernel: ata13: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 6 04:29:13 x kernel: ata13: EH complete Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Jun 6 04:29:13 x kernel: end_request: I/O error, dev sdm, sector 350365623 Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Jun 6 04:29:13 x kernel: end_request: I/O error, dev sdm, sector 350365879 Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] READ CAPACITY failed Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Sense not available. Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Write Protect is off Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Asking for cache data failed Jun 6 04:29:13 x kernel: sd 12:0:0:0: [sdm] Assuming drive cache: write through Jun 6 04:29:13 x kernel: md: super_written gets error=-5, uptodate=0 Jun 6 04:29:13 x kernel: raid5: Disk failure on sdm2, disabling device. Operation continuing on 3 devices Jun 6 04:29:13 x kernel: RAID5 conf printout: Jun 6 04:29:13 x kernel: --- rd:4 wd:3 Jun 6 04:29:13 x kernel: disk 0, o:1, dev:sdj2 Jun 6 04:29:13 x kernel: disk 1, o:1, dev:sdl2 Jun 6 04:29:13 x kernel: disk 2, o:1, dev:sdk2 Jun 6 04:29:13 x kernel: disk 3, o:0, dev:sdm2 Jun 6 04:29:13 x kernel: RAID5 conf printout: Jun 6 04:29:13 x kernel: --- rd:4 wd:3 Jun 6 04:29:13 x kernel: disk 0, o:1, dev:sdj2 Jun 6 04:29:13 x kernel: disk 1, o:1, dev:sdl2 Jun 6 04:29:13 x kernel: disk 2, o:1, dev:sdk2
Attachment:
signature.asc
Description: This is a digitally signed message part