Re: Replacing failed software RAID drive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Hugh E Cruickshank wrote:

Normally with software mirroring you would mirror partitions, not drives. What does "cat /proc/mdstat" say about them?

You are correct. I keep falling back to thinking the "MegaRAID" way
where I have the drives mirrored at the controller level and then
partitioned at the software level. The /proc/mdstat reports:

Personalities : [raid0] [raid1]
md1 : active raid1 sde2[1] sda2[2](F)
      8193024 blocks [2/1] [_U]

md2 : active raid1 sde3[1] sda3[2](F)
      2048192 blocks [2/1] [_U]

md3 : active raid1 sde5[1] sda5[2](F)
      25085376 blocks [2/1] [_U]

md4 : active raid1 sdf1[1] sdb1[0]
      35840896 blocks [2/2] [UU]

md5 : active raid1 sdg1[1] sdc1[0]
      35840896 blocks [2/2] [UU]

md6 : active raid1 sdh1[1] sdd1[0]
      35840896 blocks [2/2] [UU]

md7 : active raid0 sdn1[5] sdm1[4] sdl1[3] sdk1[2] sdj1[1] sdi1[0]
      213261312 blocks 256k chunks

md0 : active raid1 sde1[1] sda1[2](F)
      513984 blocks [2/1] [_U]

OK, you just have to replace the drive, fdisk matching partitions on it ("fdisk -l /dev/sde" will show the sizes you need), then use
mdadm --add /dev/md? /dev/sda?
for each one to add the missing partition back. Then reinstall grub on the drive.

You have an odd combination of drives... Normally you would want to mirror the partitions on the first 2 disks and install grub on both, in which case the system would still boot. Some of the more sophisticated controllers can boot from more than the first 2, though. Anyway, you should be able to boot from your install CD with 'linux rescue' at the boot prompt and get to a point where you can fix things.


The odd combination of drives was actually intentional on my part. The
idea was to provide "separation" between the mirrors. While I did not
have separate controllers I thought that using the separate channels on the common controller might provide a shade more resiliency. It was
my first attempt at setting up mirrored pairs on a non-MegaRAID SCSI
controller. Live and learn!

The controller might let you boot from the 2nd channel - and if that's the case you could install grub on /dev/sde before shutting down, adjust the controller bios, and still be able to boot. The catch is that you won't know if it will work until after you shut down..

I will read up on the "linux rescue" so, if I have to fallback on this
method, I will be able to have a firm plan in place before I start the
work.

The only tricky part is what happens to the drive names if you boot with /dev/sda broken (depending on the failure mode) or missing. If the controller doesn't see it, all of the other drive names will shift up. This normally won't affect md device detection, but you may have a non md device mentioned in /etc/fstab, especially for swap devices.

This particular system is our primary development system and does not
get all the "fancy" hardware that our production systems do. I have
configured the production systems using only the MegaRAID controllers
and there it is a "no brainer" to replace failed drives - just swap
the drive and away you go.

It isn't that complicated to fdisk a partition and mdadm --add it, and with software raid1 you gain the ability to plug any remaining single drive into any vendor's scsi controller and access the data.

--
  Les Mikesell
   lesmikesell@xxxxxxxxx

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux