Re: PROBLEM: RAID5 reshape data corruption

Nagilum <nagilum@xxxxxxxxxxx> · Sun, 06 Jan 2008 00:31:46 +0100

----- Message from neilb@xxxxxxx ---------
     Date: Fri, 4 Jan 2008 09:37:24 +1100
     From: Neil Brown <neilb@xxxxxxx>
Reply-To: Neil Brown <neilb@xxxxxxx>
  Subject: Re: PROBLEM: RAID5 reshape data corruption
       To: Nagilum <nagilum@xxxxxxxxxxx>
       Cc: linux-raid@xxxxxxxxxxxxxxx, Dan Williams
<dan.j.williams@xxxxxxxxx>, "H. Peter Anvin" <hpa@xxxxxxxxx>

I'm not just interested in a simple behaviour fix I'm also interested
in what actually happens and if possible a repair program for that
kind of data corruption.

What happens is that when reshape happens while a device is missing,
the data on that device should be computed from the other data devices
and parity.  However because of the above bug, the data is copied into
the new layout before the compute is complete.  This means that the
data that was on that device is really lost beyond recovery.

I'm really sorry about that, but there is nothing that can be done to
recover the lost data.

Thanks a lot Neil!
I can confirm your findings, the data in the chunks is the data from  
the broken device. Now to my particular case:
I still have the old disk and I haven't touched the array since.
I just run a dd_rescue -r (reverse) on the old disk and as I expected  
most of it (>99%) is still readable. So what I want to do is read the  
chunks from that disk - starting at the end down to the 4% point where  
the reshape was interrupted due to the disk read error - and replace  
the chunks on md0.
That should restore most of the data.
Now in order to do so I need to know how to calculate the different  
positions of the chunks.
So for the old disk I have:
nas:~# mdadm -E /dev/sdg
/dev/sdg:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
  Creation Time : Sat Sep 15 21:11:41 2007
     Raid Level : raid5
  Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
     Array Size : 2441543360 (2328.44 GiB 2500.14 GB)
   Raid Devices : 6
  Total Devices : 7
Preferred Minor : 0

  Reshape pos'n : 118360960 (112.88 GiB 121.20 GB)
  Delta Devices : 1 (5->6)

    Update Time : Fri Nov 23 20:05:50 2007
          State : active
 Active Devices : 6
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 9a8358c4 - correct
         Events : 0.677965

         Layout : left-symmetric
     Chunk Size : 16K

      Number   Major   Minor   RaidDevice State
this     3       8       96        3      active sync   /dev/sdg

   0     0       8        0        0      active sync   /dev/sda
   1     1       8       16        1      active sync   /dev/sdb
   2     2       8       32        2      active sync   /dev/sdc
   3     3       8       96        3      active sync   /dev/sdg
   4     4       8       64        4      active sync   /dev/sde
   5     5       8       80        5      active sync   /dev/sdf
   6     6       8       48        6      spare   /dev/sdd

the current array is:

nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat Sep 15 21:11:41 2007
     Raid Level : raid5
     Array Size : 2441543360 (2328.44 GiB 2500.14 GB)
  Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sat Jan  5 17:53:54 2008
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 16K

           UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
         Events : 0.986918

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8       64        4      active sync   /dev/sde
       5       8       80        5      active sync   /dev/sdf

At the moment I'm thinking about writing a small perl program that  
will generate me a shell script or makefile containing dd commands  
that will copy the chunks from the drive to /dev/md0. I don't care if  
that will be dog slow as long as I get most of my data back. (I'd  
probably go forward instead of backward to take advantage of the  
readahead, after I've determined the exact start chunk.)
For that I need to know one more thing.
Used Dev Size is 488308672k for md0 as well as the disk, 16k chunk size.
488308672/16 = 30519292.00
so the first dd would look like:
 dd if=/dev/sdg of=/dev/md0 bs=16k count=1 skip=30519291 seek=X

The big question now being how to calculate X.
Since I have a working testcase I can do a lot of testing before  
touching the real thing. The formula to get X will probably contain a  
5 for the 5(+1) devices the raid spans now, a 4 for the 4(+1) devices  
the raid spanned before the reshape, a 3 for the device number of the  
disk that failed and of course the skip/current chunk number.
Can you help me come up with it?
Thanks again for looking into the whole issue.

Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@xxxxxxxxxxx \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================

----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

Attachment:
pgpMr1qMSVrjQ.pgp

Description: PGP Digital Signature