----- Message from nagilum@xxxxxxxxxxx --------- Date: Sun, 06 Jan 2008 22:35:46 +0100 From: Nagilum <nagilum@xxxxxxxxxxx> Reply-To: Nagilum <nagilum@xxxxxxxxxxx> Subject: Re: PROBLEM: RAID5 reshape data corruption To: Nagilum <nagilum@xxxxxxxxxxx>Cc: Neil Brown <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, Dan Williams <dan.j.williams@xxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>
----- Message from nagilum@xxxxxxxxxxx --------- Date: Sun, 06 Jan 2008 00:31:46 +0100 From: Nagilum <nagilum@xxxxxxxxxxx>At the moment I'm thinking about writing a small perl program that will generate me a shell script or makefile containing dd commands that will copy the chunks from the drive to /dev/md0. I don't care if that will be dog slow as long as I get most of my data back. (I'd probably go forward instead of backward to take advantage of the readahead, after I've determined the exact start chunk.) For that I need to know one more thing. Used Dev Size is 488308672k for md0 as well as the disk, 16k chunk size. 488308672/16 = 30519292.00 so the first dd would look like: dd if=/dev/sdg of=/dev/md0 bs=16k count=1 skip=30519291 seek=X The big question now being how to calculate X. Since I have a working testcase I can do a lot of testing before touching the real thing. The formula to get X will probably contain a 5 for the 5(+1) devices the raid spans now, a 4 for the 4(+1) devices the raid spanned before the reshape, a 3 for the device number of the disk that failed and of course the skip/current chunk number. Can you help me come up with it? Thanks again for looking into the whole issue.----- End message from nagilum@xxxxxxxxxxx ----- Ok, the spare time over the weekend allowed me to make some headway. I'm not sure if the attachment will make it through to the ML so I uploaded the perl script to: http://www.nagilum.de/md/rdrep.pl First tests show already promising results although I seem to miss the start of the error corruption. Anyway unlike with the testcase at the real array I have to start after the area that is unreadable. I have already determined that last Friday. Anyway I would appreciate it if someone could have a look over the script. I'll probably change it a little bit and make every other dd run via exec instead of system to use some parallelism. (I guess the overhead for runnung dd will take about as much time as the transfer itself)
----- End message from nagilum@xxxxxxxxxxx ----- I just want to give a quick update.The program run for about one and a half day and it looks good, the directories and files appear ok. I'll do some work on it this evening, see if I can restore some more blocks before running xfs_repair.
Kind regards, ======================================================================== # _ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ ____ _(_) /_ ____ _ nagilum@xxxxxxxxxxx \n +491776461165 # # / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # ======================================================================== ---------------------------------------------------------------- cakebox.homeunix.net - all the machine one needs..
Attachment:
pgp7LwZyYe4Z0.pgp
Description: PGP Digital Signature