replacing failed hard drives in RAID 5 configuration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am running a server that has four 250 GB hard drives in a RAID 5 configuration. Recently, two of the hard drives failed. I copied the data bitwise from one of the failed hard drives (/dev/hdc1) to another (/dev/hdd1) using dd_rescue (http://www.garloff.de/kurt/linux/ddrescue/). The failed hard drive had about 300 bad blocks (I checked using the badblocks utility). Because of the failure of the two hard drives, the RAID (/dev/md0) wouldn't start.

I tried to add the new hard drive (/dev/hdd1) to the RAID using mdadm. I kept the failed hard drive (/dev/hdc1) in the machine. The other two functional hard drives are /dev/hdg1 and /dev/hdh1. Initially I tried starting the array with 'raidstart'. When I did this, I got the following error messages in /var/log/messages:

Oct 11 14:41:15 server-name kernel: md: invalid raid superblock magic on hdd1
Oct 11 14:41:15 server-name kernel: md: hdd1 has invalid sb, not importing!
Oct 11 14:41:15 server-name kernel: md: could not import hdd1, trying to run array nevertheless.
Oct 11 14:41:15 server-name kernel: [events: 00000017]
Oct 11 14:41:15 server-name kernel: [events: 00000017]
Oct 11 14:41:15 server-name kernel: md: autorun ...
Oct 11 14:41:15 server-name kernel: md: considering hdh1 ...
Oct 11 14:41:15 server-name kernel: md: adding hdh1 ...
Oct 11 14:41:15 server-name kernel: md: adding hdg1 ...
Oct 11 14:41:15 server-name kernel: md: adding hdc1 ...
Oct 11 14:41:15 server-name kernel: md: created md0
Oct 11 14:41:15 server-name kernel: md: bind<hdc1,1>
Oct 11 14:41:15 server-name kernel: md: bind<hdg1,2>
Oct 11 14:41:15 server-name kernel: md: bind<hdh1,3>
Oct 11 14:41:15 server-name kernel: md: running: <hdh1><hdg1><hdc1>
Oct 11 14:41:15 server-name kernel: md: hdh1's event counter: 00000017
Oct 11 14:41:15 server-name kernel: md: hdg1's event counter: 00000017
Oct 11 14:41:15 server-name kernel: md: hdc1's event counter: 0000000f
Oct 11 14:41:15 server-name kernel: md: superblock update time inconsistency -- using the most recent one
Oct 11 14:41:15 server-name kernel: md: freshest: hdh1
Oct 11 14:41:15 server-name kernel: md: kicking non-fresh hdc1 from array!
Oct 11 14:41:15 server-name kernel: md: unbind<hdc1,2>
Oct 11 14:41:15 server-name kernel: md: export_rdev(hdc1)
Oct 11 14:41:15 server-name kernel: md0: removing former faulty hdd1!
Oct 11 14:41:15 server-name kernel: md0: max total readahead window set to 768k
Oct 11 14:41:15 server-name kernel: md0: 3 data-disks, max readahead per data-disk: 256k
Oct 11 14:41:15 server-name kernel: raid5: device hdh1 operational as raid disk 3
Oct 11 14:41:15 server-name kernel: raid5: device hdg1 operational as raid disk 2
Oct 11 14:41:15 server-name kernel: raid5: not enough operational devices for md0 (2/4 failed)
Oct 11 14:41:15 server-name kernel: RAID5 conf printout:
Oct 11 14:41:15 server-name kernel: --- rd:4 wd:2 fd:2
Oct 11 14:41:15 server-name kernel: disk 0, s:0, o:0, n:0 rd:0 us:1 dev:[dev 00:00]
Oct 11 14:41:15 server-name kernel: disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]
Oct 11 14:41:15 server-name kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:hdg1
Oct 11 14:41:15 server-name kernel: disk 3, s:0, o:1, n:3 rd:3 us:1 dev:hdh1
Oct 11 14:41:15 server-name kernel: raid5: failed to run raid set md0
Oct 11 14:41:15 server-name kernel: md: pers->run() failed ...
Oct 11 14:41:15 server-name kernel: md :do_md_run() returned -22
Oct 11 14:41:15 server-name kernel: md: md0 stopped.
Oct 11 14:41:15 server-name kernel: md: unbind<hdh1,1>
Oct 11 14:41:15 server-name kernel: md: export_rdev(hdh1)
Oct 11 14:41:15 server-name kernel: md: unbind<hdg1,0>
Oct 11 14:41:15 server-name kernel: md: export_rdev(hdg1)
Oct 11 14:41:15 server-name kernel: md: ... autorun DONE.


I also tried to run the array using mdamd - 'mdadm --assemble --scan /dev/md0 /dev/hdc1 /dev/hdd1 /dev/hdg1 /dev/hdh1'. However, diung this gave me an error message of "Segmentation Fault".

Can anybody help me replace the old hard drive (/dev/hdc1) with the new hard drive (/dev/hdd1) that has data copied off of the old drive?

Thanks,
Saurabh Barve.
begin:vcard
fn:Saurabh Barve
n:Barve;Saurabh
org:Colorado State University;Department of Atmospheric Science
adr:;;4100 West Laporte Avenue;Fort Collins;CO;80523;USA
email;internet:sa@xxxxxxxxxxxxxxxxxxx
title:Systems Administrator
tel;work:(970) 491-7714
tel;home:(970) 416-7512
x-mozilla-html:TRUE
version:2.1
end:vcard


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux