Re: can i recover an all spare raid10 array ?

Robin Hill <robin@xxxxxxxxxxxxxxx> · Tue, 28 Oct 2014 18:34:22 +0000

Please don't top post, it makes conversations very difficult to follow.
Responses should go at the bottom, or interleaved with the previous post
if responding to particular points. I've moved your previous responses
to keep the conversation flow straight.

On Tue Oct 28, 2014 at 07:30:50PM +0200, Roland RoLaNd wrote:
> 
> > From: r_o_l_a_n_d@xxxxxxxxxxx
> > To: robin@xxxxxxxxxxxxxxx
> > CC: linux-raid@xxxxxxxxxxxxxxx
> > Subject: Re: can i recover an all spare raid10 array ?
> > Date: Tue, 28 Oct 2014 19:29:25 +0200
> > 
> > > Date: Tue, 28 Oct 2014 17:01:11 +0000
> > > From: robin@xxxxxxxxxxxxxxx
> > > To: r_o_l_a_n_d@xxxxxxxxxxx
> > > CC: linux-raid@xxxxxxxxxxxxxxx
> > > Subject: Re: can i recover an all spare raid10 array ?
> > > 
> > > On Tue Oct 28, 2014 at 06:22:11PM +0200, Roland RoLaNd wrote:
> > > 
> > > > I have two raid arrays on my system:
> > > > raid1: /dev/sdd1 /dev/sdh1
> > > > raid10: /dev/sde1 /dev/sda1 /dev/sdf1 /dec/sdb1 /dev/sdc1 /dev/sdg1
> > > > 
> > > > 
> > > > two disks had bad sectors: sdd and sdf <<-- they both got hot swapped.
> > > > i added sdf back to raid10 and recovery took place but adding sdd1 to
> > > > raid1 proved to be troublesome
> > > > as i didn't have anything important on '/' i formatted and installed
> > > > ubuntu 14 on raid1 
> > > > 
> > > > now system is up on raid 1, but raid10 (md127) is inactive
> > > > 
> > > > cat /proc/mdstat
> > > > 
> > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
> > > > md127 : inactive sde1[2](S) sdg1[8](S) sdc1[6](S) sdb1[5](S) sdf1[4](S) sda1[3](S)
> > > >       17580804096 blocks super 1.2
> > > >        
> > > > md2 : active raid1 sdh4[0] sdd4[1]
> > > >       2921839424 blocks super 1.2 [2/2] [UU]
> > > >       [==>..................]  resync = 10.4% (304322368/2921839424) finish=672.5min speed=64861K/sec
> > > >       
> > > > md1 : active raid1 sdh3[0] sdd3[1]
> > > >       7996352 blocks super 1.2 [2/2] [UU]
> > > >       
> > > > md0 : active raid1 sdh2[0] sdd2[1]
> > > >       292544 blocks super 1.2 [2/2] [UU]
> > > >       
> > > > unused devices: <none>
> > > > if i try to assemble md127 
> > > >   
> > > > 
> > > >   mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1
> > > > mdadm: /dev/sde1 is busy - skipping
> > > > mdadm: /dev/sda1 is busy - skipping
> > > > mdadm: /dev/sdf1 is busy - skipping
> > > > mdadm: /dev/sdb1 is busy - skipping
> > > > mdadm: /dev/sdc1 is busy - skipping
> > > > mdadm: /dev/sdg1 is busy - skipping
> > > > 
> > > > 
> > > > if i try to add one of the disks:  mdadm --add /dev/md127 /dev/sdj1
> > > > mdadm: cannot get array info for /dev/md127
> > > > 
> > > > if i try:
> > > > 
> > > > mdadm --stop /dev/md127
> > > > mdadm: stopped /dev/md127
> > > > 
> > > > then running:   mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1
> > > > 
> > > > returns: 
> > > >  
> > > > assembled from 5 drives and 1 rebuilding - not enough to start the array
> > > > 
> > > > what does it mean ? is my data lost ? 
> > > > 
> > > > if i examine one of the md127 raid 10 array disks it shows this:
> > > >  
> > > > mdadm --examine /dev/sde1
> > > > /dev/sde1:
> > > >           Magic : a92b4efc
> > > >         Version : 1.2
> > > >     Feature Map : 0x0
> > > >      Array UUID : ab90d4c8:41a55e14:635025cc:28f0ee76
> > > >            Name : ubuntu:data  (local to host ubuntu)
> > > >   Creation Time : Sat May 10 21:54:56 2014
> > > >      Raid Level : raid10
> > > >    Raid Devices : 8
> > > > 
> > > >  Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
> > > >      Array Size : 11720534016 (11177.57 GiB 12001.83 GB)
> > > >   Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
> > > >     Data Offset : 262144 sectors
> > > >    Super Offset : 8 sectors
> > > >           State : clean
> > > >     Device UUID : a2a5db61:bd79f0ae:99d97f17:21c4a619
> > > > 
> > > >     Update Time : Tue Oct 28 10:07:18 2014
> > > >        Checksum : 409deeb4 - correct
> > > >          Events : 8655
> > > > 
> > > >          Layout : near=2
> > > >      Chunk Size : 512K
> > > > 
> > > >    Device Role : Active device 2
> > > >    Array State : AAAAAAAA ('A' == active, '.' == missing)
> > > > 
> > > > Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB) <<--- does this mean i still have my data ?  
> > > > 
> > > > 
> > > > the remaining two disks:
> > > > 
> > > >   mdadm --examine /dev/sdj1
> > > > mdadm: No md superblock detected on /dev/sdj1.
> > > >   mdadm --examine /dev/sdi1
> > > > mdadm: No md superblock detected on /dev/sdi1.
> > > 
> > > The --examine output indicates the RAID10 array was 8 members, not 6.
> > > As it stands, you are missing two array members (presumably a mirrored
> > > pair as mdadm won't start the array). Without these you're missing 512K
> > > of every 2M in the array, so your data is toast (well, with a lot of
> > > effort you may recover some files under 1.5M in size).
> > > 
> > > Were you expecting sdi1 and sdj1 to have been part of the original
> > > RAID10 array? Have you removed the superblocks from them at any point?
> > > For completeness, what mdadm and kernel versions are you running?
> > > 
> > > Cheers,
> > >     Robin
> > 
> > Thanks for pitching in.here are the responses to you questions:
> > 
> > - yes i expected both of them to be part of the array though one of
> > them was just added to the array and didnt finish recovering when
> > raid1 "/" crashed
> > 
According to your --examine earlier, the RAID10 rebuild had completed
(it shows the array clean and having all disks active). Are you certain
that the new RAID1 array isn't using disks that used to be part of the
RAID10 array? Regardless, I'd expect the disks to have a superblock if
they were part of either array (unless they've been repartitioned?).

> > - i have not removed their superblocks or at least not in a way that i
> > amaware of 
> > 
> > - mdadm: 3.2.5-5ubuntu4.1
> > - uname -a: 3.13.0-24-generic
> > 
That's a pretty old mdadm version, but I don't see anything in the
change logs that looks relevant. Others may be more familiar with issues
though.

> > 
> > PS: 
> > I just followed this recovery page: 
> > https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
> > I managed to reach the last step, whenever i tried to mount it kept
> > asking me for the right file system
> >
That's good documentation anyway. As long as you stick to the overlay
devices your original data is untouched. It's amazing how many people
run --create on their original disks and lose any chance of getting the
data back.

> Correction:i couldn't force assemble the read devices so i issued instead:
>  mdadm --create /dev/md089 --assume-clean --level=10 --verbose --raid-devices=8  missing /dev/dm-1 /dev/dm-0 /dev/dm-5 /dev/dm-3 /dev/dm-2 missing  /dev/dm-4
> which got it into  degraded state
> 

What error did you get when you tried to force assemble (both from mdadm
and anything reported via dmesg)? The device order you're using would
suggest that the missing disks wouldn't be mirrors of each other, so the
data should be okay.

Can you post the --examine results for all the RAID members? Both for
the original partitions and for the overlay devices after you recreated
the array.  There may be differences in data offset, etc. which will
break the filesystem.

Cheers,
    Robin
Attachment:
signature.asc

Description: Digital signature