Re: Recreate raid 10 array

LCID Fire <lcid-fire@xxxxxxx> · Thu, 09 Apr 2009 00:14:30 +0200

First off the good news: I'm currently running on my raid10 again - with 
only little data loss.

Andrew Burgess wrote:
On Wed, 2009-04-08 at 17:47 -0400, Bill Davidsen wrote:
Goswin von Brederlow wrote:
mdadm --create --assume-clean -l 10 -n 4 /dev/mdX /dev/copied_disk_1 /dev/copied_disk2 missing missing

You need to match the create parameters exactly with the ones you
initially used (near/offset/farcopies? stripe size? ...) and the order
of devices is relevant so you might have to shuffle the disk
arguments. So just try different orders till the result can be mounted
or fscked. With the wrong options the mount/fsck could screw up the
data but then you copy the disk again for the next try. It should be
reasonably obvious when mount/fsck goes wrong as it should find tons
of errors. Mostly I would expect mount/fsck to just fail with the
wrong mdadm args though.

Most fscks can be told to run read-only so they won't write to the
device and also interactive so they ask before writing so you should be
able to avoid recopying. The ext3 journal recovery violates at least one
of these IIRC (or used to) so if it's ext3 find an option to tell it to
ignore the journal.
Too late. The journal recovery did complain quite a bit and I didn't 
know better than to have it fix the things it liked to fix.
As a result it shows the problem with many apps using sqlite these days 
- it's not very good when the database file is corrupted.

May I say that this makes a great case for saving the contents of some 
files to a safe place when the system is up and running right.? Maybe 
all of /etc, and at least a "tree /sys" and /proc/mdstat would be 
useful, preferably on something readable like a CD or USB flash drive, 
so you have a chance of reading it if you can't boot.

Of course a rescue flash drive is pretty useful as well, so that's 
probably the way to go.
Quite frankly I don't really care about / - as long as my /home is safe 
- because I can setup my machine again - but losing my work means losing 
far more time.

It also seems like mdadm could be enhanced to figure stuff like this out
given intact device superblocks (I suggest --wild-ass-guess as the
option name)
That would be great (not that I'm eager to run into that again).

As a note I did a binary comparison between the raid1 stuff and got 
quite shocked. The corrupted one had around 1.000.000 byte difference - 
something I would expect - but even the valid mirror had around 20.0000 
bytes difference - which I can't explain to myself this easily.

Anyway - thanks guys for the great help.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html