Re: Replacing failed software RAID drive

Les Mikesell <lesmikesell@xxxxxxxxx> · Sun, 07 Oct 2007 20:52:59 -0500

Hugh E Cruickshank wrote:

Normally with software mirroring you would mirror partitions, not 
drives.  What does "cat /proc/mdstat" say about them?

You are correct. I keep falling back to thinking the "MegaRAID" way
where I have the drives mirrored at the controller level and then
partitioned at the software level. The /proc/mdstat reports:

Personalities : [raid0] [raid1]
md1 : active raid1 sde2[1] sda2[2](F)
      8193024 blocks [2/1] [_U]

md2 : active raid1 sde3[1] sda3[2](F)
      2048192 blocks [2/1] [_U]

md3 : active raid1 sde5[1] sda5[2](F)
      25085376 blocks [2/1] [_U]

md4 : active raid1 sdf1[1] sdb1[0]
      35840896 blocks [2/2] [UU]

md5 : active raid1 sdg1[1] sdc1[0]
      35840896 blocks [2/2] [UU]

md6 : active raid1 sdh1[1] sdd1[0]
      35840896 blocks [2/2] [UU]

md7 : active raid0 sdn1[5] sdm1[4] sdl1[3] sdk1[2] sdj1[1] sdi1[0]
      213261312 blocks 256k chunks

md0 : active raid1 sde1[1] sda1[2](F)
      513984 blocks [2/1] [_U]

OK, you just have to replace the drive, fdisk matching partitions on it 
("fdisk -l /dev/sde" will show the sizes you need), then use
mdadm --add /dev/md? /dev/sda?
for each one to add the missing partition back.  Then reinstall grub on 
the drive.

You have an odd combination of drives... Normally you would want to 
mirror the partitions on the first 2 disks and install grub on both, in 
which case the system would still boot.  Some of the more sophisticated 
  controllers can boot from more than the first 2, though.  Anyway, you 
should be able to boot from your install CD with 'linux rescue' at the 
boot prompt and get to a point where you can fix things.

The odd combination of drives was actually intentional on my part. The
idea was to provide "separation" between the mirrors. While I did not
have separate controllers I thought that using the separate channels 
on the common controller might provide a shade more resiliency. It was
my first attempt at setting up mirrored pairs on a non-MegaRAID SCSI
controller. Live and learn!

The controller might let you boot from the 2nd channel - and if that's 
the case you could install grub on /dev/sde before shutting down, adjust 
the controller bios, and still be able to boot.  The catch is that you 
won't know if it will work until after you shut down..

I will read up on the "linux rescue" so, if I have to fallback on this
method, I will be able to have a firm plan in place before I start the
work.

The only tricky part is what happens to the drive names if you boot with 
/dev/sda broken (depending on the failure mode) or missing.  If the 
controller doesn't see it, all of the other drive names will shift up. 
This normally won't affect md device detection, but you may have a non 
md device mentioned in /etc/fstab, especially for swap devices.

This particular system is our primary development system and does not
get all the "fancy" hardware that our production systems do. I have
configured the production systems using only the MegaRAID controllers
and there it is a "no brainer" to replace failed drives - just swap
the drive and away you go.

It isn't that complicated to fdisk a partition and mdadm --add it, and 
with software raid1 you gain the ability to plug any remaining single 
drive into any vendor's scsi controller and access the data.

--
  Les Mikesell
   lesmikesell@xxxxxxxxx

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos