Re: RAID5 recovering

Robin Hill <robin@xxxxxxxxxxxxxxx> · Mon, 15 Apr 2013 16:19:39 +0100

On Mon Apr 15, 2013 at 03:47:39PM +0200, Pierre Martineau wrote:

> Dear Raid experts,
> 
> I have a Raid5 volume that recently crashed and I need you advices 
> before doing some irreversible action.
> 
> Let me first summarize the past and current state.
> 
> 1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top 
> and several LVM volumes in ext3 and axt4) but volume was now a bit too 
> small and I decided to add a new 1 To disk.
> 
Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
you have the space, adding another disk and going to RAID6 will be much
safer.

> 2) I added a new disk and did not do anything for a couple of days (Raid 
> still running with 3 disks)
> 
> 3) One of the old disk failed and was ejected from the RAID.
> 
> 4) The ejected disk was not even present as /dev/sdX. I thus tested the 
> connections and the disk came back.
> 
> 5) I resync the ejected disk and I was back with my original 3 disk array.
> 
> 6) I waited 2-3 days and everything was fine. I then added the new disk 
> and resync.
> 
> 7) I had now a running 4 disk RAID5 array, I created a new volume and 
> started copying on it.
> 
> 8) During the week-end, 2 disks were ejected from the array, the new 
> installed one and the same than previously (step 3)
> 
> 9) Again the 2 disks were not present in /dev/sdX. I thus checked again 
> the connections and the problem was a molex connector. The two ejected 
> disks were on the same molex and this explains why both were detected as 
> faulty.
> 
> Now, my list of errors as a newbie.
> 
> 4) I did not save all the informations before proceeding (mdadm 
> --examine, /etc/mdadm/mdadm.conf, syslog, ...)
> 
> 5) I tried to assemble the disks with
> mdadm --assemble --scan
> with no result
> 
> 6) I thus tried and this is my big error I think !!!
> mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> 
> I forgot in this command /dev/md0 after assemble.
> Because of this /dev/sdb1 suberblock was removed and now mdadm--examine 
> /dev/sdb1 returns "No md superblock detected on /dev/sdb1"
> 
> I would like now to be more cautious. If some nice expert from the list 
> would be nice enough to tell me if the proposed method described below 
> is the right approach I will be grateful for the rest of my life :-)
> 
> 7) I read the RAID wiki and the list.
> 
> 8) I saved
> mdadm --examine /dev/sd[bcde]1
> dmesg
> syslog
> /etc/mdadm/mdadm.conf
> fdisk -lu /dev/sd[bcde]
> 
> I put the content of this files at the end of this message (except dmesg 
> and syslog because they are very long).
> 
> 9) /dev/sdd is the new disk. This is clear in the fdisk listing since it 
> is a 4K sector disk.
> The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
> sdb1 sdc1 sde1 sdd1
> 
> 10) Events are
> /dev/sdb1: no md superblock (see 6)
> /dev/sdc1: Events : 112358
> /dev/sdd1: Events : 112333
> /dev/sde1: Events : 112358
> 
> It seems that sdd was the first disk removed.
> Presumably sdb1 is in sync since it was running with sdc1 when the sdd1 
> and sde1 were ejected from the array (see 8) but I can't be sure since I 
> stupidly erased its superblock!
> 
> 11) I propose to re-create the array with the --assume-clean option, 
> then check everything using "fsck -n" and "mount -o ro"
> the command would be:
> 
> mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
> --chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1
> 
<-- snip -->

Have you tried to force assemble the array first? Recreating the array
is a risky option, so should be avoided if possible. First try doing:
  mdadm -Af /dev/md0 /dev/sd[cde]1

If that works then you'll need to re-add (and rebuild) /dev/sdb1. If it
doesn't work, try rerunning (after making sure the array is stopped) and
adding "-vvv" for extra verbosity, then send through the output from
that and anything relevant from dmesg.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |
Attachment:
signature.asc

Description: Digital signature