Over the past 5 months, I've had a drive booted from one of my raid arrays about 6 times. In each case, the drive passes SMART tests, so I --remove it, --re-add it, and it resyncs successfully. I tried disconnecting and re-connecting all four SATA cables, but the problem occurred again. In fact, today *two* partitions were kicked out of their (different) raid devices. All of the problems occurred with sda and sdc, which are older drives: sda: SAMSUNG SP2004C sdc: SAMSUNG SP2504C hddtemp shows the temperatures at 32C. System runs Debian lenny, with newer kernel than lenny: 2.6.28. mdadm version v2.6.7.2. Motherboard is a Gigabyte GA-E7AUM-DS2H. I couldn't find the controller chipset info. Are the drives just bad? Or is it the controller? More detailed information is below. Thanks for any help! Let me know if I should provide more information. Dan syslog messages from today: Jun 2 03:54:22 boots kernel: [66986.000043] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jun 2 03:54:23 boots kernel: [66986.000052] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Jun 2 03:54:23 boots kernel: [66986.000053] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 2 03:54:23 boots kernel: [66986.000056] ata1.00: status: { DRDY } Jun 2 03:54:23 boots kernel: [66986.000064] ata1: hard resetting link Jun 2 03:54:23 boots kernel: [66986.484037] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jun 2 03:54:23 boots kernel: [66986.494003] ata1.00: configured for UDMA/133 Jun 2 03:54:23 boots kernel: [66986.494016] end_request: I/O error, dev sda, sector 187880006 Jun 2 03:54:23 boots kernel: [66986.494023] md: super_written gets error=-5, uptodate=0 Jun 2 03:54:23 boots kernel: [66986.494027] raid5: Disk failure on sda7, disabling device. Jun 2 03:54:24 boots kernel: [66986.494029] raid5: Operation continuing on 3 devices. Jun 2 03:54:24 boots kernel: [66986.494045] ata1: EH complete Jun 2 03:54:24 boots kernel: [66986.494215] sd 0:0:0:0: [sda] 390719855 512-byte hardware sectors: (200 GB/186 GiB) Jun 2 03:54:24 boots kernel: [66986.494244] sd 0:0:0:0: [sda] Write Protect is off Jun 2 03:54:24 boots kernel: [66986.494248] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Jun 2 03:54:24 boots kernel: [66986.494274] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 2 03:54:24 boots kernel: [66986.762936] RAID5 conf printout: Jun 2 03:54:24 boots mdadm[4109]: Fail event detected on md device /dev/md3, component device /dev/sda7 Jun 2 03:54:24 boots kernel: [66986.762942] --- rd:4 wd:3 Jun 2 03:54:24 boots kernel: [66986.762946] disk 0, o:0, dev:sda7 Jun 2 03:54:24 boots kernel: [66986.762948] disk 1, o:1, dev:sdb3 Jun 2 03:54:24 boots kernel: [66986.762950] disk 2, o:1, dev:sdc5 Jun 2 03:54:24 boots kernel: [66986.762953] disk 3, o:1, dev:sdd3 Jun 2 03:54:24 boots kernel: [66986.763626] RAID5 conf printout: Jun 2 03:54:24 boots kernel: [66986.763628] --- rd:4 wd:3 Jun 2 03:54:24 boots kernel: [66986.763630] disk 1, o:1, dev:sdb3 Jun 2 03:54:24 boots kernel: [66986.763632] disk 2, o:1, dev:sdc5 Jun 2 03:54:24 boots kernel: [66986.763634] disk 3, o:1, dev:sdd3 Jun 2 06:59:33 boots kernel: [78097.000087] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jun 2 06:59:34 boots kernel: [78097.000095] ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Jun 2 06:59:34 boots kernel: [78097.000096] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jun 2 06:59:34 boots kernel: [78097.000099] ata4.00: status: { DRDY } Jun 2 06:59:34 boots kernel: [78097.000106] ata4: hard resetting link Jun 2 06:59:34 boots kernel: [78097.484057] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jun 2 06:59:35 boots kernel: [78097.493930] ata4.00: configured for UDMA/133 Jun 2 06:59:35 boots kernel: [78097.493941] end_request: I/O error, dev sdc, sector 488391944 Jun 2 06:59:35 boots kernel: [78097.493947] md: super_written gets error=-5, uptodate=0 Jun 2 06:59:35 boots kernel: [78097.493952] raid5: Disk failure on sdc7, disabling device. Jun 2 06:59:35 boots kernel: [78097.493953] raid5: Operation continuing on 2 devices. Jun 2 06:59:35 boots kernel: [78097.493967] ata4: EH complete Jun 2 06:59:35 boots kernel: [78097.494105] sd 3:0:0:0: [sdc] 488397168 512-byte hardware sectors: (250 GB/232 GiB) Jun 2 06:59:35 boots kernel: [78097.494124] sd 3:0:0:0: [sdc] Write Protect is off Jun 2 06:59:35 boots kernel: [78097.494127] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00 Jun 2 06:59:35 boots kernel: [78097.494156] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jun 2 06:59:35 boots mdadm[4109]: Fail event detected on md device /dev/md5, component device /dev/sdc7 Jun 2 06:59:35 boots kernel: [78097.635934] RAID5 conf printout: Jun 2 06:59:35 boots kernel: [78097.635938] --- rd:3 wd:2 Jun 2 06:59:35 boots kernel: [78097.635941] disk 0, o:1, dev:sdb6 Jun 2 06:59:35 boots kernel: [78097.635944] disk 1, o:0, dev:sdc7 Jun 2 06:59:35 boots kernel: [78097.635946] disk 2, o:1, dev:sdd6 Jun 2 06:59:36 boots kernel: [78097.636143] RAID5 conf printout: Jun 2 06:59:36 boots kernel: [78097.636146] --- rd:3 wd:2 Jun 2 06:59:36 boots kernel: [78097.636148] disk 0, o:1, dev:sdb6 Jun 2 06:59:36 boots kernel: [78097.636150] disk 2, o:1, dev:sdd6 ------------------------ /proc/mdstat: Personalities : [raid1] [raid6] [raid5] [raid4] md6 : active raid1 sdb7[0] sdd7[1] 196290048 blocks [2/2] [UU] bitmap: 1/3 pages [4KB], 32768KB chunk md5 : active raid5 sdc7[3] sdb6[0] sdd6[2] 175815168 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U] [=================>...] recovery = 89.3% (78552568/87907584) finish=4.6min speed=33323K/sec bitmap: 1/2 pages [4KB], 32768KB chunk md4 : active raid5 sda8[0] sdd5[3] sdc6[2] sdb5[1] 218636160 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] bitmap: 0/2 pages [0KB], 32768KB chunk md3 : active raid5 sda7[4] sdd3[3] sdc5[2] sdb3[1] 218612160 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU] resync=DELAYED bitmap: 2/2 pages [8KB], 32768KB chunk md2 : active raid5 sda6[0] sdd2[3] sdc2[2] sdb2[1] 30748032 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] bitmap: 1/1 pages [4KB], 32768KB chunk md0 : active raid5 sda2[0] sdd1[2] sdc1[1] 578048 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU] bitmap: 0/1 pages [0KB], 32768KB chunk md1 : active raid1 sdb1[0] sda5[1] 289024 blocks [2/2] [UU] bitmap: 0/1 pages [0KB], 32768KB chunk unused devices: <none> ------------------ /etc/mdadm/mdadm.conf: # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR jdc@xxxxxx # definitions of existing MD arrays ARRAY /dev/md1 level=raid1 num-devices=2 UUID=6b8b4567:327b23c6:643c9869:66334873 ARRAY /dev/md0 level=raid5 num-devices=3 UUID=ba493129:00074cd3:fee07e15:038135d5 ARRAY /dev/md2 level=raid5 num-devices=4 UUID=3dc9b50b:b9270472:9778d943:b967813b ARRAY /dev/md3 level=raid5 num-devices=4 UUID=c4056d19:7b4bb550:44925b88:91d5bc8a ARRAY /dev/md4 level=raid5 num-devices=4 UUID=d7c84402:210b78c7:556bbbc0:47df436c ARRAY /dev/md5 level=raid5 num-devices=3 UUID=9effd43f:93ccc32d:899ca6c7:ea966964 ARRAY /dev/md6 level=raid1 num-devices=2 UUID=da17264f:be7e012d:85187211:fb0e2ebd -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html