Hi! I created a sw-raid md0 and a LVM above with four 250GB Samsung SATA disks a couple of months ago. I am not an raid expert but I thought I could handle it with a little help of my friends from grml: Andreas jimmy Gredler and Michael mika Prokop. ,---- | md0 <future mds> (PV:s on partitions or whole disks) | \ / | \ / | datavg (VG) | | | | | datalv (LV) | | | ext3 (filesystem) `---- HW: Promise FastTrack SATA controller on an P3-board. (A previously used - and preferred - Dawicontrol DC-150 did not work at all: I could not access the hdds.) Approximately once a month, there was a short timeout that caused a disk to be removed from the raid. A SMART-check and a resync (hot-add) solved the problem so far. ,----[ syslog ] | May 1 23:12:51 ned kernel: ata2: command timeout | May 1 23:12:51 ned kernel: ata2: translated ATA stat/err 0x25/00\ to SCSI SK/ASC/ASCQ 0x4/00/00 | May 1 23:12:51 ned kernel: ata2: status=0x25 { DeviceFault\ CorrectedError Error } | May 1 23:12:51 ned kernel: SCSI error : <1 0 0 0> return code =\ 0x8000002 | May 1 23:12:51 ned kernel: sdb: Current: sense key: Hardware Error | May 1 23:12:51 ned kernel: Additional sense: No additional sense\ information | May 1 23:12:51 ned kernel: end_request: I/O error, dev sdb, sector\ 179281983 | May 1 23:12:51 ned kernel: raid5: Disk failure on sdb1, disabling\ device. Operation continuing on 3 devices | May 1 23:12:51 ned kernel: RAID5 conf printout: | May 1 23:12:51 ned kernel: --- rd:4 wd:3 fd:1 | May 1 23:12:51 ned kernel: disk 0, o:1, dev:sda1 | May 1 23:12:51 ned kernel: disk 1, o:0, dev:sdb1 | May 1 23:12:51 ned kernel: disk 2, o:1, dev:sdc1 | May 1 23:12:51 ned kernel: disk 3, o:1, dev:sdd1 | May 1 23:12:51 ned kernel: RAID5 conf printout: | May 1 23:12:51 ned kernel: --- rd:4 wd:3 fd:1 | May 1 23:12:51 ned kernel: disk 0, o:1, dev:sda1 | May 1 23:12:51 ned kernel: disk 2, o:1, dev:sdc1 | May 1 23:12:51 ned kernel: disk 3, o:1, dev:sdd1 `---- But two weeks ago, there were another timeout during such a resync and that was the beginning of my problem. Short summary (for the impatient) ============= sda and sdb were removed, hot adding did not work out and I accidentally thought, that removing and adding the drives again could solve my problem. Bad idea. Now I am not able to get the raid working: all drives are marked as spares and they can't be assembled: root@ned ~ # mdadm --examine /dev/sd[abcd]1 /dev/sda1: Magic : a92b4efc Version : 00.90.02 UUID : 15f07005:037e4abf:70f51389:83dde0ed Creation Time : Sun Jan 29 21:35:05 2006 Raid Level : raid5 Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 2 17:23:03 2006 State : clean Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 4eb2dfe6 - correct Events : 0.1652541 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 1 4 spare /dev/sda1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 0 0 3 faulty removed 4 4 8 1 4 spare /dev/sda1 5 5 8 33 5 spare /dev/sdc1 6 6 8 17 6 spare /dev/sdb1 7 7 8 49 7 spare /dev/sdd1 /dev/sdb1: Magic : a92b4efc Version : 00.90.02 UUID : 15f07005:037e4abf:70f51389:83dde0ed Creation Time : Sun Jan 29 21:35:05 2006 Raid Level : raid5 Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 2 17:23:03 2006 State : clean Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 4eb2dffa - correct Events : 0.1652541 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 17 6 spare /dev/sdb1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 0 0 3 faulty removed 4 4 8 1 4 spare /dev/sda1 5 5 8 33 5 spare /dev/sdc1 6 6 8 17 6 spare /dev/sdb1 7 7 8 49 7 spare /dev/sdd1 /dev/sdc1: Magic : a92b4efc Version : 00.90.02 UUID : 15f07005:037e4abf:70f51389:83dde0ed Creation Time : Sun Jan 29 21:35:05 2006 Raid Level : raid5 Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 2 17:23:03 2006 State : clean Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 4eb2e008 - correct Events : 0.1652541 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 5 8 33 5 spare /dev/sdc1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 0 0 3 faulty removed 4 4 8 1 4 spare /dev/sda1 5 5 8 33 5 spare /dev/sdc1 6 6 8 17 6 spare /dev/sdb1 7 7 8 49 7 spare /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 00.90.02 UUID : 15f07005:037e4abf:70f51389:83dde0ed Creation Time : Sun Jan 29 21:35:05 2006 Raid Level : raid5 Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 2 17:23:03 2006 State : clean Active Devices : 0 Working Devices : 4 Failed Devices : 0 Spare Devices : 4 Checksum : 4eb2e01c - correct Events : 0.1652541 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 7 8 49 7 spare /dev/sdd1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 0 0 2 faulty removed 3 3 0 0 3 faulty removed 4 4 8 1 4 spare /dev/sda1 5 5 8 33 5 spare /dev/sdc1 6 6 8 17 6 spare /dev/sdb1 7 7 8 49 7 spare /dev/sdd1 root@ned ~ # root@grml ~ # date;cat /proc/mdstat Di Jul 4 21:36:15 CEST 2006 Personalities : [linear] [raid0] [raid1] [raid10] [raid5] [raid4]\ [raid6] [multipath] unused devices: <none> root@grml ~ # mdadm --detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. 1 root@grml ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\ /dev/sdc1 /dev/sdd1 mdadm: /dev/md0 assembled from 0 drives and 4 spares - not enough to\ start the array. 1 root@grml ~ # mdadm --stop /dev/md0 root@grml ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\ /dev/sdc1 /dev/sdd1 --force mdadm: /dev/md0 assembled from 0 drives and 4 spares - not\ enough to start the array. 1 root@grml ~ # mdadm --zero-superblock /dev/sda mdadm: Couldn't open /dev/sda for write - not zeroing 1 root@grml ~ # mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1\ /dev/sdc1 /dev/sdd1 --run mdadm: failed to RUN_ARRAY /dev/md0: Input/output error 1 root@grml ~ # Andreas Gredler suggested following lines as a last attempt but risk of loosing data which I want to avoid: mdadm --stop /dev/md0 mdadm --zero-superblock /dev/sda mdadm --zero-superblock /dev/sdb mdadm --zero-superblock /dev/sdc mdadm --zero-superblock /dev/sdd mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1\ /dev/sdd1 --force mdadm --create -n 4 -l 5 /dev/md0 missing /dev/sdb1\ /dev/sdc1 /dev/sdd1 Is there another solution to get to my data? Thank you! Background history (the whole story - directors cut) ================== I published the whole story (as much as I could log during my reboots and so on) on the web: http://paste.debian.net/8779 It is avaliable for 72h from now on. If you want to read it afterwards, please write me an email and I send the log to you. Please feel free to visit this page and do not hesitate to write me, what I can also check! mdadm-version: 1.12.0-1 uname: Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005\ i686 GNU/Linux - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html