Re: Recovering from two raid superblocks on the same disks

NeilBrown <neilb@xxxxxxx> · Mon, 28 May 2012 21:40:03 +1000

On Mon, 28 May 2012 00:14:55 -0700 Jeff Johnson
<jeff.johnson@xxxxxxxxxxxxxxxxx> wrote:

> Greetings,
> 
> I am looking at a very unique situation and trying to successfully 1TB 
> of very critical data.
> 
> The md raid in question is a 12-drive RAID-10 sitting between two 
> identical nodes via a shared SAS link. Originally the 12 drives were 
> configured as two six drive RAID-10 volumes using the entire disk device 
> (no partitions on member drives). That configuration was later scrapped 
> in favor of a single 12-drive RAID-10 but in this configuration a single 
> partition was created and the partition was used as the RAID member 
> device instead of the entire disk (sdb1 vs sdb).
> 
> One of the systems had the old two six-drive RAID-10 mdadm.conf file 
> left in /etc. Due to a power outage both systems went down and then 
> rebooted. When one system, the one with the old mdadm.conf file, came up 
> md referenced the file, saw the intact old superblocks at the beginning 
> of the drive and started an assemble and resync of those two six-drive 
> RAID-10 volumes. The resync process got to 40% before it was stopped.
> 
> The other system managed to enumerate the drives and see the partition 
> maps prior to the other node assembling the old superblock config. I can 
> still see the newer md superblocks that start on the partition boundary 
> rather than the beginning of the physical drive.
> 
> It appears that md overwrite protection was in a way circumvented by the 
> old superblocks matching the old mdadm.conf file and not seeing 
> conflicting superblocks at the beginning of the partition boundaries.
> 
> Both versions, old and new, were RAID-10. It appears that the errant 
> resync of the old configuration didn't corrupt the newer RAID config 
> since the drives were allocated in the same order and the same drives 
> were paired (mirrors) in both old and new configs. I am guessing that 
> since the striping method was RAID-0 the absence of stripe parity to 
> check kept the data on the drives from being corrupted. This is 
> conjecture on my part.
> 
> Old config:
> RAID-10, /dev/md0, /dev/sd[bcdefg]
> RAID-10, /dev/md1, /dev/sd[hijklm]
> 
> New config:
> RAID-10, /dev/md0, /dev/sd[bcdefghijklm]1
> 
> It appears that the old superblock remained in that ~17KB gap between 
> physical start of disk and the start boundary of partition 1 where the 
> new superblock was written.
> 
> I was able to still see the partitions on the other node. I was able to 
> read the new config superblocks from 11 of the 12 drives. UUIDs, state, 
> all seem to be correct.
> 
> Three questions:
> 
> 1) Has anyone seen a situation like this before?

I haven't.

> 2) Is it possible that since the mirrored pairs were allocated in the 
> same order that the data was not overwritten?

Certainly possible.

> 3) What is the best way to assemble and run a 12-drive RAID-10 with 
> member drive 0 (sdb1) seemingly blank (no superblock)?

It would be good to work out exactly why sdb1 is blank as knowing that might
provide a useful insight into the overall situation.  However it probably
isn't critical.

The --assemble command you list below should be perfectly safe and allow
read access without risking any corruption.
If you
   echo 1 >  /sys/module/md_mod/parameters/start_ro

then it will be even safer (if that is possible).  It will certainly not write
anything until you write to the array yourself.
You can then 'fsck -n', 'mount -o ro' and copy any super-critical files before
proceeding.
I would then probably
   echo check > /sys/block/md0/md/sync_action

just to see if everything is ok (low mismatch count expected).

I also recommend removing the old superblocks.
 mdadm --zero /dev/sdc --metadata=0.90

will look for a 0.90 superblock on sdc and if it finds one, it will erase it.
You should first double check with
   mdadm --examine --metadata=0.90 /dev/sda
to ensure that is the one you want to remove
(without the --metadata=0.90 it will look for other metadata, and you might
not want it to do that without you checking first).

Good luck,
NeilBrown

> 
> The current state of the 12-drive volume is:  (note: sdb1 has no 
> superblock but the drive is physically fine)
> 
> /dev/sdc1:
>            Magic : a92b4efc
>          Version : 0.90.00
>             UUID : 852267e0:095a343c:f4f590ad:3333cb43
>    Creation Time : Tue Feb 14 18:56:08 2012
>       Raid Level : raid10
>    Used Dev Size : 586059136 (558.91 GiB 600.12 GB)
>       Array Size : 3516354816 (3353.46 GiB 3600.75 GB)
>     Raid Devices : 12
>    Total Devices : 12
> Preferred Minor : 0
> 
>      Update Time : Sat May 26 12:05:11 2012
>            State : clean
>   Active Devices : 12
> Working Devices : 12
>   Failed Devices : 0
>    Spare Devices : 0
>         Checksum : 21bca4ce - correct
>           Events : 26
> 
>           Layout : near=2
>       Chunk Size : 32K
> 
>        Number   Major   Minor   RaidDevice State
> this     1       8       33        1      active sync   /dev/sdc1
> 
>     0     0       8       17        0      active sync
>     1     1       8       33        1      active sync   /dev/sdc1
>     2     2       8       49        2      active sync   /dev/sdd1
>     3     3       8       65        3      active sync   /dev/sde1
>     4     4       8       81        4      active sync   /dev/sdf1
>     5     5       8       97        5      active sync   /dev/sdg1
>     6     6       8      113        6      active sync   /dev/sdh1
>     7     7       8      129        7      active sync   /dev/sdi1
>     8     8       8      145        8      active sync   /dev/sdj1
>     9     9       8      161        9      active sync   /dev/sdk1
>    10    10       8      177       10      active sync   /dev/sdl1
>    11    11       8      193       11      active sync   /dev/sdm1
> 
> I could just run 'mdadm -A --uuid=852267e0095a343cf4f590ad3333cb43 
> /dev/sd[bcdefghijklm]1 --run' but I feel better seeking advice and 
> consensus before doing anything.
> 
> I have never seen a situation like this before. It seems like there 
> might be one correct way to get the data back and many ways of losing 
> the data for good. Any advice or feedback is greatly appreciated!
> 
> --Jeff
> 

Attachment:
signature.asc

Description: PGP signature