On Mon, 23 Apr 2012 15:56:16 +0200 Christoph Nelles <evilazrael@xxxxxxxxxxxxx> wrote: > Hi, > > Linux RAID worked for me fine in the last few years, but yesterday while > reorganizing the HW in my server the RAID5 crashed. It was a > Software-RAID Level 5 with 6x 3TB drives and ran XFS on top of it. I > have no idea why it crashed, but now all superblocks are invalid (one > dump follows) and sadly i have no information on the raid disk layout > (in which sequence the drives were). All drives from the raid are > available and running. > > As i cannot afford to buy 6x more drives for making a backup prior > trying to fix the situation, i need a non-destructive approach to fix > the RAID configuration and the superblocks. > > >From my understanding of the RAID5 implementation the correct order of > drives is important. > > First Question: > 1) Am i right that the order is important and i have to try to find the > right sequence of drives? > > So i would create a loop over all permutations of the drive list and for > each permutation: > - Scrub the Superblock mdadm --zero-superblock /dev/sd[bcdefg]1 > - Recreate the RAID5 mdadm --create /dev/md0 -c 64 -l 5 \ > -n 6 --assume-clean <drive permutation> > - Run xfs_check to see if it recognizes the FS xfs_check -s /dev/md0 > - Stop the RAID mdadm --stop /dev/md0 > > 2) Is that a promising approach to repair the RAID5 array? > 3) According the man page the --assume-cleanthat no data is affected > unless you write to the array, so this effectively prevents a rebuild? > This is important for me, as i don't want to trigger a rebuild as this > will certainly send my data to hell. > 4) Any other idea for repairing the RAID without loosing user data? > > Thanks in advance for any answers. > > > Currently the RAID superblocks on each device look like this: > > /dev/sdg1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : 53a294b5:975244fc:343b0f94:16652fce > Name : grml:0 > Creation Time : Fri Apr 15 20:55:52 2011 > Raid Level : -unknown- > Raid Devices : 0 > > Avail Dev Size : 5860529039 (2794.52 GiB 3000.59 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : 9688dc72:02140045:c16a2123:4f6cc006 > > Update Time : Sun Apr 22 23:56:14 2012 > Checksum : 350d8d74 - correct > Events : 1 > > > Device Role : spare > Array State : ('A' == active, '.' == missing) > > > Interestingly at the Update Time the system should have been shut down: > Apr 22 23:55:55 router init: Switching to runlevel: 0 > [...] > Apr 22 23:56:03 router exiting on signal 15 > Apr 22 23:59:21 router syslogd 1.5.0: restart. > > I have really no clue what happened. This is really worrying. It's about the 3rd or 4th report recently which contains: > Raid Level : -unknown- > Raid Devices : 0 and that should not be possible. There must be some recent bug that causes the array to be "cleared" *before* writing out the metadata - and that should be impossible. What kernel are you running? You are correct that order is important. Your algorithm looks good. However I suggest that you first look through your system looks to see if RAID conf printout: appears at all. That could contain the device order. NeilBrown
Attachment:
signature.asc
Description: PGP signature