I'm running a 4 disk software raid5 array with linux 2.6.12.1. Each disk is
a 80 GB IDE master disk on a single used IDE bus (no slave drives). So far
the array runs great but a few weeks ago one disk (hdk) in the array
failed. After looking at the connectors I refit the connector to the drive
(it seems to be a weak connection). The resync begin as the system is
rebooted. But in the middle off the resync a second drive (hdg) had a
problem. There are a couple of block unreadable *sick*. The array went down
and it seems that all data is lost. This is not a real problem since the
array is only used for a personal VDR.
But I thought this would be a good time to start to fiddle with the raid to
see if there is a chance to rescue some data. I first start making a backup
of each drive with "dd if=/dev/hde | gzip -1 > hde.gz". After googling
around for I while I found
<http://www.tldp.org/HOWTO/Software-RAID-HOWTO-8.html#ss8.1> but the
instructions there won't work. I even tried to recreate the array as
suggested on different mailling list. The last try I've done used
mdadm-2.0-devel-2 with the patch from 14.07.2005
(<http://www.opensubscriber.com/message/linux-raid@xxxxxxxxxxxxxxx/1737664.html>)
from this mailling list. Sometimes I was able to recreate the array but if
I try to mount the array it seems that there is no valid ext3 filesystem
within.
So here is the list of events that caused the raid failure:
1) hdk went down due to a connector problem.
2) power off machine and refit connector.
3) power on and resync starts
4) hdg fails with some unreadble sectors (as according to kern.log)
5) md0 went down.
Is there anything else I can do to rescue the data? I assume you need more
"input" but I don't think its a good idea to post even more logs in the
list, so please ask if something is missing.
The output below is from mdadm-2.0-devel-2 examine. What I don't understand
is that there is difference in the "Spare Devices".
---***---
/dev/hde1:
Magic : a92b4efc
Version : 00.90.01
UUID : 89d60b87:f4132b59:c073bd02:53de0ef9
Creation Time : Tue Dec 28 12:24:48 2004
Raid Level : raid5
Device Size : 80043136 (76.34 GiB 81.96 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sat Jul 23 20:23:19 2005
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Checksum : c5646fe8 - expected c6586ef4
Events : 0.4340017
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 3 33 1 3 active sync /dev/hde1
0 0 0 0 524288 spare
1 3670016 65536 65536 393216 spare
2 0 0 131072 589824 spare
3 2162688 65536 196608 393216 spare
4 3735552 65536 262144 0 spare
---***---
---***---
/dev/hdg1:
Magic : a92b4efc
Version : 00.90.00
UUID : 7b631138:ca5ac82b:95f1b9df:25e26bff
Creation Time : Fri Aug 5 11:55:02 2005
Raid Level : raid5
Device Size : 80043136 (76.34 GiB 81.96 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Aug 5 11:55:02 2005
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 35699ae6 - correct
Events : 0.1
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 34 1 1 active sync /dev/hdg1
0 0 33 1 0 active sync /dev/hde1
1 1 34 1 1 active sync /dev/hdg1
2 2 56 1 2 active sync /dev/hdi1
3 3 0 0 3 faulty
---***---
/dev/hdi1:
Magic : a92b4efc
Version : 00.90.01
UUID : 89d60b87:f4132b59:c073bd02:53de0ef9
Creation Time : Tue Dec 28 12:24:48 2004
Raid Level : raid5
Device Size : 80043136 (76.34 GiB 81.96 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sat Jul 23 20:23:19 2005
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Checksum : c564701b - correct
Events : 0.4340017
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 1 56 1 1 active sync /dev/hdi1
0 0 0 0 0 removed
1 1 56 1 1 active sync /dev/hdi1
2 2 34 1 2 active sync /dev/hdg1
3 3 33 1 3 active sync /dev/hde1
4 4 57 1 4 spare /dev/hdk1
---***---
---***---
/dev/hdk1:
Magic : a92b4efc
Version : 00.90.01
UUID : 89d60b87:f4132b59:c073bd02:53de0ef9
Creation Time : Tue Dec 28 12:24:48 2004
Raid Level : raid5
Device Size : 80043136 (76.34 GiB 81.96 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Sat Jul 23 20:23:19 2005
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Checksum : c5646ffc - correct
Events : 0.4340017
Layout : left-symmetric
Chunk Size : 32K
Number Major Minor RaidDevice State
this 4 57 1 4 spare /dev/hdk1
0 0 0 0 0 removed
1 1 56 1 1 active sync /dev/hdi1
2 2 0 0 2 faulty removed
3 3 33 1 3 active sync /dev/hde1
4 4 57 1 4 spare /dev/hdk1
---***---
--
Claas Hilbrecht
http://www.jucs-kramkiste.de
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html