Re: PROBLEM: RAID5 reshape data corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- Message from nagilum@xxxxxxxxxxx ---------
    Date: Sun, 06 Jan 2008 22:35:46 +0100
    From: Nagilum <nagilum@xxxxxxxxxxx>
Reply-To: Nagilum <nagilum@xxxxxxxxxxx>
 Subject: Re: PROBLEM: RAID5 reshape data corruption
      To: Nagilum <nagilum@xxxxxxxxxxx>
Cc: Neil Brown <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, Dan Williams <dan.j.williams@xxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>


----- Message from nagilum@xxxxxxxxxxx ---------
    Date: Sun, 06 Jan 2008 00:31:46 +0100
    From: Nagilum <nagilum@xxxxxxxxxxx>

At the moment I'm thinking about writing a small perl program that
will generate me a shell script or makefile containing dd commands
that will copy the chunks from the drive to /dev/md0. I don't care if
that will be dog slow as long as I get most of my data back. (I'd
probably go forward instead of backward to take advantage of the
readahead, after I've determined the exact start chunk.)
For that I need to know one more thing.
Used Dev Size is 488308672k for md0 as well as the disk, 16k chunk size.
488308672/16 = 30519292.00
so the first dd would look like:
dd if=/dev/sdg of=/dev/md0 bs=16k count=1 skip=30519291 seek=X

The big question now being how to calculate X.
Since I have a working testcase I can do a lot of testing before
touching the real thing. The formula to get X will probably contain a
5 for the 5(+1) devices the raid spans now, a 4 for the 4(+1) devices
the raid spanned before the reshape, a 3 for the device number of the
disk that failed and of course the skip/current chunk number.
Can you help me come up with it?
Thanks again for looking into the whole issue.
----- End message from nagilum@xxxxxxxxxxx -----

Ok, the spare time over the weekend allowed me to make some headway.
I'm not sure if the attachment will make it through to the ML so I
uploaded the perl script to: http://www.nagilum.de/md/rdrep.pl
First tests show already promising results although I seem to miss the
start of the error corruption. Anyway unlike with the testcase at the
real array I have to start after the area that is unreadable. I have
already determined that last Friday.
Anyway I would appreciate it if someone could have a look over the script.
I'll probably change it a little bit and make every other dd run via
exec instead of system to use some parallelism. (I guess the overhead
for runnung dd will take about as much time as the transfer itself)
----- End message from nagilum@xxxxxxxxxxx -----

I just want to give a quick update.
The program run for about one and a half day and it looks good, the directories and files appear ok. I'll do some work on it this evening, see if I can restore some more blocks before running xfs_repair.
Kind regards,

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@xxxxxxxxxxx \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

Attachment: pgp7LwZyYe4Z0.pgp
Description: PGP Digital Signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux