Re: Recovering failed array

NeilBrown <neilb@xxxxxxx> · Fri, 23 Sep 2011 14:15:12 +1000

On Thu, 22 Sep 2011 18:39:10 -0400 Alex <mysqlstudent@xxxxxxxxx> wrote:

> Hi,
> 
> >> Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] [linear]
> >> md1 : inactive sda2[0] sdd2[4](S) sdb2[1]
> >>       205820928 blocks super 1.1
> >>
> >> md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
> >>       255988 blocks super 1.0 [4/4] [UUUU]
> >>
> >>
> >> # mdadm --add /dev/md1 /dev/sdd2
> >> mdadm: Cannot open /dev/sdd2: Device or resource busy
> >>
> >> # mdadm --run /dev/md1
> >> mdadm: failed to run array /dev/md1: Input/output error
> >>
> >> I've tried "--assemble --scan" and it also provides an IO error.
> >>
> >> mdadm.conf:
> >> # mdadm.conf written out by anaconda
> >> MAILADDR root
> >> AUTO +imsm +1.x -all
> >> ARRAY /dev/md0 level=raid1 num-devices=4
> >> UUID=9406b71d:8024a882:f17932f6:98d4df18
> >> ARRAY /dev/md1 level=raid5 num-devices=4
> >> UUID=f5bb8db9:85f66b43:32a8282a:fb664152
> >
> > Please show the output of "lsdrv" [1] and then "mdadm -D /dev/md[01]", and also "mdadm -E /dev/sd[abcd][12]"
> >
> > (From within your rescue environment.)  Some errors are likely, but get what you can.
> 
> Great, thanks for your offer to help. Great program you've written.
> I've included the output here:
> 
> # mdadm -E /dev/sd[abcd][12]
> http://pastebin.com/3JcBjiV6
> 
> # When I booted into the rescue CD again, it mounted md0 as md127
> http://pastebin.com/yXnzzL6K
> 

Hmmm ... looks like a bit of a mess.  Two devices that should be active
arrays appear to be spares. I suspect you tried to --add them when you
shouldn't have.  Newer version of mdadm stop you from doing that but older
version don't.  You only --add a device that you want to be a spare, not a
device that you think is part of the array.

All of the devices think that device 2 (the third in the array) should  exist
and  be working, but no device claims to be it.  Presumably it is /dev/sdc2.

You will need to recreate the array.
i.e.

 mdadm -S /dev/md1
or 
 mdadm -S /dev/md125 /dev/md126

or whatever md arrays claim to be holding any of the 4 devices according
to /proc/mdstat.

Then

 mdadm -C /dev/md1 -e 1.1 --level 5 -n 4  --chunk 512 --assume-clean \
    /dev/sda2 /dev/sdb2 /dev/sdc2 missing

This will just re-write the metadata and assemble the array.  It won't change
the data.
Then "fsck -n /dev/md1" and make sure it looks good.
If it does: good.
If not, try again with sdd2 in place of sdc2.

Once you are happy that you can see your data, you can add the other device
as a spare and it will rebuild.

You don't really need the --assume-clean above because a degraded RAID5 is
always assumed to be clean, but it is good practice to use --assume-clean
whenever re-creating an array which has real data on it.

Good luck,
NeilBrown
Attachment:
signature.asc

Description: PGP signature