Re: can i recover an all spare raid10 array ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue Oct 28, 2014 at 10:17:44PM +0200, Roland RoLaNd wrote:

> 
> 
> ----------------------------------------
> > Date: Tue, 28 Oct 2014 20:02:39 +0000
> > From: robin@xxxxxxxxxxxxxxx
> > To: r_o_l_a_n_d@xxxxxxxxxxx
> > CC: linux-raid@xxxxxxxxxxxxxxx
> > Subject: Re: can i recover an all spare raid10 array ?
> >
> > On Tue Oct 28, 2014 at 09:11:21PM +0200, Roland RoLaNd wrote:
> >
> >>
> >>
> >> ----------------------------------------
> >>> Date: Tue, 28 Oct 2014 18:34:22 +0000
> >>> From: robin@xxxxxxxxxxxxxxx
> >>> To: r_o_l_a_n_d@xxxxxxxxxxx
> >>> CC: robin@xxxxxxxxxxxxxxx; linux-raid@xxxxxxxxxxxxxxx
> >>> Subject: Re: can i recover an all spare raid10 array ?
> >>>
> >>> Please don't top post, it makes conversations very difficult to follow.
> >>> Responses should go at the bottom, or interleaved with the previous post
> >>> if responding to particular points. I've moved your previous responses
> >>> to keep the conversation flow straight.
> >>>
> >>> On Tue Oct 28, 2014 at 07:30:50PM +0200, Roland RoLaNd wrote:
> >>>>
> >>>>> From: r_o_l_a_n_d@xxxxxxxxxxx
> >>>>> To: robin@xxxxxxxxxxxxxxx
> >>>>> CC: linux-raid@xxxxxxxxxxxxxxx
> >>>>> Subject: Re: can i recover an all spare raid10 array ?
> >>>>> Date: Tue, 28 Oct 2014 19:29:25 +0200
> >>>>>
> >>>>>> Date: Tue, 28 Oct 2014 17:01:11 +0000
> >>>>>> From: robin@xxxxxxxxxxxxxxx
> >>>>>> To: r_o_l_a_n_d@xxxxxxxxxxx
> >>>>>> CC: linux-raid@xxxxxxxxxxxxxxx
> >>>>>> Subject: Re: can i recover an all spare raid10 array ?
> >>>>>>
> >>>>>> On Tue Oct 28, 2014 at 06:22:11PM +0200, Roland RoLaNd wrote:
> >>>>>>
> >>>>>>> I have two raid arrays on my system:
> >>>>>>> raid1: /dev/sdd1 /dev/sdh1
> >>>>>>> raid10: /dev/sde1 /dev/sda1 /dev/sdf1 /dec/sdb1 /dev/sdc1 /dev/sdg1
> >>>>>>>
> >>>>>>>
> >>>>>>> two disks had bad sectors: sdd and sdf <<-- they both got hot swapped.
> >>>>>>> i added sdf back to raid10 and recovery took place but adding sdd1 to
> >>>>>>> raid1 proved to be troublesome
> >>>>>>> as i didn't have anything important on '/' i formatted and installed
> >>>>>>> ubuntu 14 on raid1
> >>>>>>>
> >>>>>>> now system is up on raid 1, but raid10 (md127) is inactive
> >>>>>>>
> >>>>>>> cat /proc/mdstat
> >>>>>>>
> >>>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
> >>>>>>> md127 : inactive sde1[2](S) sdg1[8](S) sdc1[6](S) sdb1[5](S) sdf1[4](S) sda1[3](S)
> >>>>>>> 17580804096 blocks super 1.2
> >>>>>>>
> >>>>>>> md2 : active raid1 sdh4[0] sdd4[1]
> >>>>>>> 2921839424 blocks super 1.2 [2/2] [UU]
> >>>>>>> [==>..................] resync = 10.4% (304322368/2921839424) finish=672.5min speed=64861K/sec
> >>>>>>>
> >>>>>>> md1 : active raid1 sdh3[0] sdd3[1]
> >>>>>>> 7996352 blocks super 1.2 [2/2] [UU]
> >>>>>>>
> >>>>>>> md0 : active raid1 sdh2[0] sdd2[1]
> >>>>>>> 292544 blocks super 1.2 [2/2] [UU]
> >>>>>>>
> >>>>>>> unused devices: <none>
> >>>>>>> if i try to assemble md127
> >>>>>>>
> >>>>>>>
> >>>>>>> mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1
> >>>>>>> mdadm: /dev/sde1 is busy - skipping
> >>>>>>> mdadm: /dev/sda1 is busy - skipping
> >>>>>>> mdadm: /dev/sdf1 is busy - skipping
> >>>>>>> mdadm: /dev/sdb1 is busy - skipping
> >>>>>>> mdadm: /dev/sdc1 is busy - skipping
> >>>>>>> mdadm: /dev/sdg1 is busy - skipping
> >>>>>>>
> >>>>>>>
> >>>>>>> if i try to add one of the disks: mdadm --add /dev/md127 /dev/sdj1
> >>>>>>> mdadm: cannot get array info for /dev/md127
> >>>>>>>
> >>>>>>> if i try:
> >>>>>>>
> >>>>>>> mdadm --stop /dev/md127
> >>>>>>> mdadm: stopped /dev/md127
> >>>>>>>
> >>>>>>> then running: mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1
> >>>>>>>
> >>>>>>> returns:
> >>>>>>>
> >>>>>>> assembled from 5 drives and 1 rebuilding - not enough to start the array
> >>>>>>>
> >>>>>>> what does it mean ? is my data lost ?
> >>>>>>>
> >>>>>>> if i examine one of the md127 raid 10 array disks it shows this:
> >>>>>>>
> >>>>>>> mdadm --examine /dev/sde1
> >>>>>>> /dev/sde1:
> >>>>>>> Magic : a92b4efc
> >>>>>>> Version : 1.2
> >>>>>>> Feature Map : 0x0
> >>>>>>> Array UUID : ab90d4c8:41a55e14:635025cc:28f0ee76
> >>>>>>> Name : ubuntu:data (local to host ubuntu)
> >>>>>>> Creation Time : Sat May 10 21:54:56 2014
> >>>>>>> Raid Level : raid10
> >>>>>>> Raid Devices : 8
> >>>>>>>
> >>>>>>> Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
> >>>>>>> Array Size : 11720534016 (11177.57 GiB 12001.83 GB)
> >>>>>>> Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB)
> >>>>>>> Data Offset : 262144 sectors
> >>>>>>> Super Offset : 8 sectors
> >>>>>>> State : clean
> >>>>>>> Device UUID : a2a5db61:bd79f0ae:99d97f17:21c4a619
> >>>>>>>
> >>>>>>> Update Time : Tue Oct 28 10:07:18 2014
> >>>>>>> Checksum : 409deeb4 - correct
> >>>>>>> Events : 8655
> >>>>>>>
> >>>>>>> Layout : near=2
> >>>>>>> Chunk Size : 512K
> >>>>>>>
> >>>>>>> Device Role : Active device 2
> >>>>>>> Array State : AAAAAAAA ('A' == active, '.' == missing)
> >>>>>>>
> >>>>>>> Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB) <<--- does this mean i still have my data ?
> >>>>>>>
> >>>>>>>
> >>>>>>> the remaining two disks:
> >>>>>>>
> >>>>>>> mdadm --examine /dev/sdj1
> >>>>>>> mdadm: No md superblock detected on /dev/sdj1.
> >>>>>>> mdadm --examine /dev/sdi1
> >>>>>>> mdadm: No md superblock detected on /dev/sdi1.
> >>>>>>
> >>>>>> The --examine output indicates the RAID10 array was 8 members, not 6.
> >>>>>> As it stands, you are missing two array members (presumably a mirrored
> >>>>>> pair as mdadm won't start the array). Without these you're missing 512K
> >>>>>> of every 2M in the array, so your data is toast (well, with a lot of
> >>>>>> effort you may recover some files under 1.5M in size).
> >>>>>>
> >>>>>> Were you expecting sdi1 and sdj1 to have been part of the original
> >>>>>> RAID10 array? Have you removed the superblocks from them at any point?
> >>>>>> For completeness, what mdadm and kernel versions are you running?
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Robin
> >>>>>
> >>>>> Thanks for pitching in.here are the responses to you questions:
> >>>>>
> >>>>> - yes i expected both of them to be part of the array though one of
> >>>>> them was just added to the array and didnt finish recovering when
> >>>>> raid1 "/" crashed
> >>>>>
> >>> According to your --examine earlier, the RAID10 rebuild had completed
> >>> (it shows the array clean and having all disks active). Are you certain
> >>> that the new RAID1 array isn't using disks that used to be part of the
> >>> RAID10 array? Regardless, I'd expect the disks to have a superblock if
> >>> they were part of either array (unless they've been repartitioned?).
> >>>
> >>
> >> the examine earlier was to one of the 6 disks that belong to the current inactive array.. they're all clean
> >> as for raid1/10 arrays, that's what i thought as it happened with me before, but lsblk shows the following:
> >>
> >> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> >> sda 8:0 0 2.7T 0 disk
> >> └─sda1 8:1 0 2.7T 0 part
> >> sdb 8:16 0 2.7T 0 disk
> >> └─sdb1 8:17 0 2.7T 0 part
> >> sdc 8:32 0 2.7T 0 disk
> >> └─sdc1 8:33 0 2.7T 0 part
> >> sdd 8:48 0 2.7T 0 disk
> >> ├─sdd1 8:49 0 1M 0 part
> >> ├─sdd2 8:50 0 286M 0 part
> >> │ └─md0 9:0 0 285.7M 0 raid1 /boot
> >> ├─sdd3 8:51 0 7.6G 0 part
> >> │ └─md1 9:1 0 7.6G 0 raid1 [SWAP]
> >> └─sdd4 8:52 0 2.7T 0 part
> >> └─md2 9:2 0 2.7T 0 raid1 /
> >> sde 8:64 0 2.7T 0 disk
> >> └─sde1 8:65 0 2.7T 0 part
> >> sdf 8:80 0 2.7T 0 disk
> >> └─sdf1 8:81 0 2.7T 0 part
> >> sdg 8:96 0 2.7T 0 disk
> >> └─sdg1 8:97 0 2.7T 0 part
> >> sdh 8:112 0 2.7T 0 disk
> >> ├─sdh1 8:113 0 1M 0 part
> >> ├─sdh2 8:114 0 286M 0 part
> >> │ └─md0 9:0 0 285.7M 0 raid1 /boot
> >> ├─sdh3 8:115 0 7.6G 0 part
> >> │ └─md1 9:1 0 7.6G 0 raid1 [SWAP]
> >> └─sdh4 8:116 0 2.7T 0 part
> >> └─md2 9:2 0 2.7T 0 raid1 /
> >> sdi 8:128 0 2.7T 0 disk
> >> └─sdi1 8:129 0 2.7T 0 part
> >> sdj 8:144 0 2.7T 0 disk
> >> └─sdj1 8:145 0 2.7T 0 part
> >>
> >>
> >>>>> - i have not removed their superblocks or at least not in a way that i
> >>>>> amaware of
> >>>>>
> >>>>> - mdadm: 3.2.5-5ubuntu4.1
> >>>>> - uname -a: 3.13.0-24-generic
> >>>>>
> >>> That's a pretty old mdadm version, but I don't see anything in the
> >>> change logs that looks relevant. Others may be more familiar with issues
> >>> though.
> >>
> >> that's the latest in my current ubuntu repository
> >>
> >>>
> >>>>>
> >>>>> PS:
> >>>>> I just followed this recovery page:
> >>>>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
> >>>>> I managed to reach the last step, whenever i tried to mount it kept
> >>>>> asking me for the right file system
> >>>>>
> >>> That's good documentation anyway. As long as you stick to the overlay
> >>> devices your original data is untouched. It's amazing how many people
> >>> run --create on their original disks and lose any chance of getting the
> >>> data back.
> >>
> >> unfortunately i used to be/am one of those people.
> >> had bad experiences with this before, so i took it slow and went with
> >> the overlay documentation.
> >> all ebooks i could found about raid speak about different between
> >> multiple raid levels but none are thorough when it comes to setting
> >> up/troubleshooting raid.
> >> and once i do fix my issue, i move on to the next firefighting
> >> situation so i lose interest due to lack of time.
> >>
> >>>
> >>>> Correction:i couldn't force assemble the read devices so i issued instead:
> >>>> mdadm --create /dev/md089 --assume-clean --level=10 --verbose --raid-devices=8 missing /dev/dm-1 /dev/dm-0 /dev/dm-5 /dev/dm-3 /dev/dm-2 missing /dev/dm-4
> >>>> which got it into degraded state
> >>>>
> >>>
> >>> What error did you get when you tried to force assemble (both from mdadm
> >>> and anything reported via dmesg)? The device order you're using would
> >>> suggest that the missing disks wouldn't be mirrors of each other, so the
> >>> data should be okay.
> >>
> >> mdadm --assemble --force /dev/md100 $OVERLAYS
> >> mdadm: /dev/md100 assembled from 5 drives and 1 rebuilding - not enough to start the array.
> >>
> > That's very odd - all the --examine results for the original disks show
> > the array as clean. That would suggest an issue with the installed
> > version of mdadm but it doesn't really matter in this case - see below.
> 
> when i got ubuntu 14 installed, i issued apt-get update && apt-get upgrade -y
> would that have affected anything ? 
> 
No - it's just that there must be some bug in that version of mdadm.

> >
> >> dmesg:
> >> [ 6025.573964] md: md100 stopped.
> >> [ 6025.595810] md: bind<dm-0>
> >> [ 6025.596086] md: bind<dm-5>
> >> [ 6025.596364] md: bind<dm-2>
> >> [ 6025.596612] md: bind<dm-1>
> >> [ 6025.596840] md: bind<dm-4>
> >> [ 6025.597026] md: bind<dm-3>
> >>
> >>>
> >>> Can you post the --examine results for all the RAID members? Both for
> >>> the original partitions and for the overlay devices after you recreated
> >>> the array. There may be differences in data offset, etc. which will
> >>> break the filesystem.
> >>
> >> Original partitions:
> >> http://pastebin.com/nHCxidvE
> >>
> >> overlay:
> >> http://pastebin.com/eva4cnu6
> >>
> >
> > Right - these show you have the wrong order. The original partition
> > array device roles are:
> > sdc1: 2
> > sda1: 3
> > sdf1: 4
> > sdb1: 5
> > sdc1: 6
> > sdg1: 7
> >
> > and your overlays are:
> > dm-1: 1
> > dm-0: 2
> > dm-5: 3
> > dm-3: 4
> > dm-2: 5
> > dm-4: 7
> >
> > So the bad news is that you're missing roles 0 & 1, which will be
> > mirrors. That means your array is broken unless any other member disks
> > can be found
> 
> am i mistaken to think that the order of disks in an array can be known from the "   Device Role : Active device Z " in mdadm --examine /dev/sdXN ?
> 
That's correct, for version 1.x metadata anyway (I think 0.9 reports
things differently). This may be different from the number after the
device in /proc/mdstat though (which indicates the order the device was
added, so keeps increasing as the disks are replaced).

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux