---------------------------------------- > Date: Tue, 28 Oct 2014 18:34:22 +0000 > From: robin@xxxxxxxxxxxxxxx > To: r_o_l_a_n_d@xxxxxxxxxxx > CC: robin@xxxxxxxxxxxxxxx; linux-raid@xxxxxxxxxxxxxxx > Subject: Re: can i recover an all spare raid10 array ? > > Please don't top post, it makes conversations very difficult to follow. > Responses should go at the bottom, or interleaved with the previous post > if responding to particular points. I've moved your previous responses > to keep the conversation flow straight. > > On Tue Oct 28, 2014 at 07:30:50PM +0200, Roland RoLaNd wrote: >> >>> From: r_o_l_a_n_d@xxxxxxxxxxx >>> To: robin@xxxxxxxxxxxxxxx >>> CC: linux-raid@xxxxxxxxxxxxxxx >>> Subject: Re: can i recover an all spare raid10 array ? >>> Date: Tue, 28 Oct 2014 19:29:25 +0200 >>> >>>> Date: Tue, 28 Oct 2014 17:01:11 +0000 >>>> From: robin@xxxxxxxxxxxxxxx >>>> To: r_o_l_a_n_d@xxxxxxxxxxx >>>> CC: linux-raid@xxxxxxxxxxxxxxx >>>> Subject: Re: can i recover an all spare raid10 array ? >>>> >>>> On Tue Oct 28, 2014 at 06:22:11PM +0200, Roland RoLaNd wrote: >>>> >>>>> I have two raid arrays on my system: >>>>> raid1: /dev/sdd1 /dev/sdh1 >>>>> raid10: /dev/sde1 /dev/sda1 /dev/sdf1 /dec/sdb1 /dev/sdc1 /dev/sdg1 >>>>> >>>>> >>>>> two disks had bad sectors: sdd and sdf <<-- they both got hot swapped. >>>>> i added sdf back to raid10 and recovery took place but adding sdd1 to >>>>> raid1 proved to be troublesome >>>>> as i didn't have anything important on '/' i formatted and installed >>>>> ubuntu 14 on raid1 >>>>> >>>>> now system is up on raid 1, but raid10 (md127) is inactive >>>>> >>>>> cat /proc/mdstat >>>>> >>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] >>>>> md127 : inactive sde1[2](S) sdg1[8](S) sdc1[6](S) sdb1[5](S) sdf1[4](S) sda1[3](S) >>>>> 17580804096 blocks super 1.2 >>>>> >>>>> md2 : active raid1 sdh4[0] sdd4[1] >>>>> 2921839424 blocks super 1.2 [2/2] [UU] >>>>> [==>..................] resync = 10.4% (304322368/2921839424) finish=672.5min speed=64861K/sec >>>>> >>>>> md1 : active raid1 sdh3[0] sdd3[1] >>>>> 7996352 blocks super 1.2 [2/2] [UU] >>>>> >>>>> md0 : active raid1 sdh2[0] sdd2[1] >>>>> 292544 blocks super 1.2 [2/2] [UU] >>>>> >>>>> unused devices: <none> >>>>> if i try to assemble md127 >>>>> >>>>> >>>>> mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1 >>>>> mdadm: /dev/sde1 is busy - skipping >>>>> mdadm: /dev/sda1 is busy - skipping >>>>> mdadm: /dev/sdf1 is busy - skipping >>>>> mdadm: /dev/sdb1 is busy - skipping >>>>> mdadm: /dev/sdc1 is busy - skipping >>>>> mdadm: /dev/sdg1 is busy - skipping >>>>> >>>>> >>>>> if i try to add one of the disks: mdadm --add /dev/md127 /dev/sdj1 >>>>> mdadm: cannot get array info for /dev/md127 >>>>> >>>>> if i try: >>>>> >>>>> mdadm --stop /dev/md127 >>>>> mdadm: stopped /dev/md127 >>>>> >>>>> then running: mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1 >>>>> >>>>> returns: >>>>> >>>>> assembled from 5 drives and 1 rebuilding - not enough to start the array >>>>> >>>>> what does it mean ? is my data lost ? >>>>> >>>>> if i examine one of the md127 raid 10 array disks it shows this: >>>>> >>>>> mdadm --examine /dev/sde1 >>>>> /dev/sde1: >>>>> Magic : a92b4efc >>>>> Version : 1.2 >>>>> Feature Map : 0x0 >>>>> Array UUID : ab90d4c8:41a55e14:635025cc:28f0ee76 >>>>> Name : ubuntu:data (local to host ubuntu) >>>>> Creation Time : Sat May 10 21:54:56 2014 >>>>> Raid Level : raid10 >>>>> Raid Devices : 8 >>>>> >>>>> Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB) >>>>> Array Size : 11720534016 (11177.57 GiB 12001.83 GB) >>>>> Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB) >>>>> Data Offset : 262144 sectors >>>>> Super Offset : 8 sectors >>>>> State : clean >>>>> Device UUID : a2a5db61:bd79f0ae:99d97f17:21c4a619 >>>>> >>>>> Update Time : Tue Oct 28 10:07:18 2014 >>>>> Checksum : 409deeb4 - correct >>>>> Events : 8655 >>>>> >>>>> Layout : near=2 >>>>> Chunk Size : 512K >>>>> >>>>> Device Role : Active device 2 >>>>> Array State : AAAAAAAA ('A' == active, '.' == missing) >>>>> >>>>> Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB) <<--- does this mean i still have my data ? >>>>> >>>>> >>>>> the remaining two disks: >>>>> >>>>> mdadm --examine /dev/sdj1 >>>>> mdadm: No md superblock detected on /dev/sdj1. >>>>> mdadm --examine /dev/sdi1 >>>>> mdadm: No md superblock detected on /dev/sdi1. >>>> >>>> The --examine output indicates the RAID10 array was 8 members, not 6. >>>> As it stands, you are missing two array members (presumably a mirrored >>>> pair as mdadm won't start the array). Without these you're missing 512K >>>> of every 2M in the array, so your data is toast (well, with a lot of >>>> effort you may recover some files under 1.5M in size). >>>> >>>> Were you expecting sdi1 and sdj1 to have been part of the original >>>> RAID10 array? Have you removed the superblocks from them at any point? >>>> For completeness, what mdadm and kernel versions are you running? >>>> >>>> Cheers, >>>> Robin >>> >>> Thanks for pitching in.here are the responses to you questions: >>> >>> - yes i expected both of them to be part of the array though one of >>> them was just added to the array and didnt finish recovering when >>> raid1 "/" crashed >>> > According to your --examine earlier, the RAID10 rebuild had completed > (it shows the array clean and having all disks active). Are you certain > that the new RAID1 array isn't using disks that used to be part of the > RAID10 array? Regardless, I'd expect the disks to have a superblock if > they were part of either array (unless they've been repartitioned?). > the examine earlier was to one of the 6 disks that belong to the current inactive array.. they're all clean as for raid1/10 arrays, that's what i thought as it happened with me before, but lsblk shows the following: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2.7T 0 disk └─sda1 8:1 0 2.7T 0 part sdb 8:16 0 2.7T 0 disk └─sdb1 8:17 0 2.7T 0 part sdc 8:32 0 2.7T 0 disk └─sdc1 8:33 0 2.7T 0 part sdd 8:48 0 2.7T 0 disk ├─sdd1 8:49 0 1M 0 part ├─sdd2 8:50 0 286M 0 part │ └─md0 9:0 0 285.7M 0 raid1 /boot ├─sdd3 8:51 0 7.6G 0 part │ └─md1 9:1 0 7.6G 0 raid1 [SWAP] └─sdd4 8:52 0 2.7T 0 part └─md2 9:2 0 2.7T 0 raid1 / sde 8:64 0 2.7T 0 disk └─sde1 8:65 0 2.7T 0 part sdf 8:80 0 2.7T 0 disk └─sdf1 8:81 0 2.7T 0 part sdg 8:96 0 2.7T 0 disk └─sdg1 8:97 0 2.7T 0 part sdh 8:112 0 2.7T 0 disk ├─sdh1 8:113 0 1M 0 part ├─sdh2 8:114 0 286M 0 part │ └─md0 9:0 0 285.7M 0 raid1 /boot ├─sdh3 8:115 0 7.6G 0 part │ └─md1 9:1 0 7.6G 0 raid1 [SWAP] └─sdh4 8:116 0 2.7T 0 part └─md2 9:2 0 2.7T 0 raid1 / sdi 8:128 0 2.7T 0 disk └─sdi1 8:129 0 2.7T 0 part sdj 8:144 0 2.7T 0 disk └─sdj1 8:145 0 2.7T 0 part >>> - i have not removed their superblocks or at least not in a way that i >>> amaware of >>> >>> - mdadm: 3.2.5-5ubuntu4.1 >>> - uname -a: 3.13.0-24-generic >>> > That's a pretty old mdadm version, but I don't see anything in the > change logs that looks relevant. Others may be more familiar with issues > though. that's the latest in my current ubuntu repository > >>> >>> PS: >>> I just followed this recovery page: >>> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID >>> I managed to reach the last step, whenever i tried to mount it kept >>> asking me for the right file system >>> > That's good documentation anyway. As long as you stick to the overlay > devices your original data is untouched. It's amazing how many people > run --create on their original disks and lose any chance of getting the > data back. unfortunately i used to be/am one of those people. had bad experiences with this before, so i took it slow and went with the overlay documentation. all ebooks i could found about raid speak about different between multiple raid levels but none are thorough when it comes to setting up/troubleshooting raid. and once i do fix my issue, i move on to the next firefighting situation so i lose interest due to lack of time. > >> Correction:i couldn't force assemble the read devices so i issued instead: >> mdadm --create /dev/md089 --assume-clean --level=10 --verbose --raid-devices=8 missing /dev/dm-1 /dev/dm-0 /dev/dm-5 /dev/dm-3 /dev/dm-2 missing /dev/dm-4 >> which got it into degraded state >> > > What error did you get when you tried to force assemble (both from mdadm > and anything reported via dmesg)? The device order you're using would > suggest that the missing disks wouldn't be mirrors of each other, so the > data should be okay. mdadm --assemble --force /dev/md100 $OVERLAYS mdadm: /dev/md100 assembled from 5 drives and 1 rebuilding - not enough to start the array. dmesg: [ 6025.573964] md: md100 stopped. [ 6025.595810] md: bind<dm-0> [ 6025.596086] md: bind<dm-5> [ 6025.596364] md: bind<dm-2> [ 6025.596612] md: bind<dm-1> [ 6025.596840] md: bind<dm-4> [ 6025.597026] md: bind<dm-3> > > Can you post the --examine results for all the RAID members? Both for > the original partitions and for the overlay devices after you recreated > the array. There may be differences in data offset, etc. which will > break the filesystem. Original partitions: http://pastebin.com/nHCxidvE overlay: http://pastebin.com/eva4cnu6 > > Cheers, > Robin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html