Re: recovering from raid5 corruption

Shaya Potter <spotter@xxxxxxxxx> · Sun, 29 Apr 2012 21:13:22 -0400

On 04/29/2012 09:09 PM, NeilBrown wrote:
On Sun, 29 Apr 2012 20:46:36 -0400 Shaya Potter<spotter@xxxxxxxxx>  wrote:

On 04/29/2012 07:45 PM, NeilBrown wrote:
On Sun, 29 Apr 2012 19:29:10 -0400 Shaya Potter<spotter@xxxxxxxxx>   wrote:

On 04/29/2012 06:52 PM, NeilBrown wrote:

You've written a new superblock 4K in to each device, where previously here
was something.   So you have probably corrupted something though we cannot
easily tell what.

Retry your experiment with --metadata=0.90.  Hopefully one of those
combinations will work better.  If it does, make a backup of the data you
want to keep, then I would suggest rebuilding the array from scratch.

ok, thanks, that was a huge help.

I have it setup correctly now (obvious due to the fact that I can read
the lvm configuration without any gibberish when ordered correctly).

I should add that this only proves that you have the first device correct,
the rest may be wrong.
You need to activate the LVM, then look at the filesystem and see if it is
consistent before you can be sure that all devices are in the correct
position.

this cheat sheet came in handy

http://www.datadisk.co.uk/html_docs/redhat/rh_lvm.htm

did the method at the bottom "corrupt LVM metadata but replacing the
faulty disk"

copy/paste config file out of beginning of fs.

pvcreate --uuid<uuid for pv0, from config file>  /dev/md0
vgcfgrestore -f<config file>  <pv name>
vgchange -a y<pv name>

some cursory testing of large contigious files that have checksumming
built in seems to indicate that they are all ok.  probably have other
corruption due to the md 0,90 to 1.20 metadata booboo, but if that's
only 16k-20k (4k * 4 or 5 disks) spread out over 3tb of data, I'm very
happy :)  and it's mostly family photo data, so not the biggest deal if
the large majority is ok.

<shew>  relieved.

Excellent.  Thanks for keeping us informed.

If you were using 3.3.1, 3.3.2, or 3.3.3 when this happened, then I know what
caused it and suggest upgrading to 3.3.4.

dont think so.  main disk died, so plugged a new main disk in and 
installed ubuntu 12.04 server on it, but it wasn't playing nice, so 
turned around and installed debian squeeze and thats when I noticed the 
issue.  debian is running 2.6.32.  Ubuntu is running some 3.something, 
but unsure which one.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html