Failed RAID5 array - recovery help

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I've been running a 5-disks RAID5 volume with an ext4 filesystem on a Synology NAS since 2012 without any problems, until, last week, bad things happened.

At first, my disk in slot 5 "failed". I'm putting quotation marks here because as I'll explain later, I later found out that the disk is actually in good shape, so it might have been a controller issue, who knows...

At this point, the array is degraded but still fully working. I don't do anything other than ordering another disk for replacement.

Couple days later, new disk gets delivered. I remove the failed disk from slot 5, put in the new disk and initiate the resync of the volume.

Of course, half way through, what had to happen happenned. Got URE on disk in Slot 1. Disk is marked failed and volume is also failed as a consequence of 2 disks missing.

Now, it's time to think about recovery, because I unfortunately do not have a very recent backup of the data (lesson learned, won't do this ever again).


At this point, I decide to freeze everything before trying anything stupid.

I took all 5 original disks from the NAS out and connected them to a linux machine and went through a very lengthy process of running ddrescue to image them all.

 - Slot 5 disk (the first one that failed) happens to read properly, no errors at all...

 - Slot 1 disk (the one who failed next with URE) has 2 consecutive sectors (1kb) at approx 60% of the volume that can't be read, all other data reads fine

 - Slots 2, 3 and 4 disks read fine


So, I now have full images of all 5 disks I can safely work on. They are on a LVM-based volume and I have a snapshot, so I can easily try and fail with bad mdadm commands and easily go back to original dumps.


My Events counter on disks looks like this:

root@lab:/# mdadm --examine /mnt/dump2/slot{1,2,3,4,5}.img | grep Event
         Events : 2357031
         Events : 2357038
         Events : 2357041
         Events : 2357044
         Events : 2354905

Disk 5 is way behind, which is normal since the array was kept running for a couple days after that disk failed.

Disks 1,2,3 and 4 are all pretty close. They are not exactly the same number, but I think this is because I didn't stop the raid volume before pulling the disks out, so each time a disk was pulled, the Array State in the superblock was updated on the remaining disks. My mistake here, but hopefully not going to be a big deal ?

So, my conclusion at this point is that I probably still have a consistent state with disks 1,2,3 and 4 (except that I have a known 1kb of data that's corrupted, but shouldn't be a very big deal, those sectors may have not been used at all by the filesystem, and even if they did, this shouldn't prevent me from recovering most of my files, as long as I can reassemble the volume somehow).

I was thinking about trying something like mdadm --assemble --assume-clean --level=5 --raid-devices=5 /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3 missing

(with /dev/loop0-4 respectively pointing to my disks 1-4, and declaring disk 5 as missing)


Haven't tried this yet, would this be the right approach ? Any other suggestions are welcome.

Thanks in advance.



Pasting below the output of some commands:

root@lab:/# mdadm --examine /mnt/dump2/*.img
/mnt/dump2/slot1.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 89299ff0:6fa8ac04:0beea54f:bc0674c8

    Update Time : Tue Aug 28 22:16:42 2018
       Checksum : 5d04dd8d - correct
         Events : 2357031

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)
/mnt/dump2/slot2.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6d56ce4e:f49d35da:96069592:056b4055

    Update Time : Tue Aug 28 22:22:19 2018
       Checksum : 60737dfa - correct
         Events : 2357038

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : .AAA. ('A' == active, '.' == missing)
/mnt/dump2/slot3.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 3a8a9c7d:8711e931:3b64eee5:fd9461c9

    Update Time : Sat Sep  1 21:56:49 2018
       Checksum : ae71ed02 - correct
         Events : 2357041

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : ..AA. ('A' == active, '.' == missing)
/mnt/dump2/slot4.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 31f3790a:85db548a:a84d2754:c75854e8

    Update Time : Sun Sep  2 06:38:53 2018
       Checksum : 20e8478a - correct
         Events : 2357044

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : ...A. ('A' == active, '.' == missing)
/mnt/dump2/slot5.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e046df58:28bb1715:160ed2d5:6e2aae94

    Update Time : Fri Aug 24 22:00:11 2018
       Checksum : 2810ff0a - correct
         Events : 2354905

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing)
root@lab:/# mdadm --examine /mnt/dump2/slot{1,2,3,4,5}.img
/mnt/dump2/slot1.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 89299ff0:6fa8ac04:0beea54f:bc0674c8

    Update Time : Tue Aug 28 22:16:42 2018
       Checksum : 5d04dd8d - correct
         Events : 2357031

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : AAAAA ('A' == active, '.' == missing)
/mnt/dump2/slot2.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 6d56ce4e:f49d35da:96069592:056b4055

    Update Time : Tue Aug 28 22:22:19 2018
       Checksum : 60737dfa - correct
         Events : 2357038

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : .AAA. ('A' == active, '.' == missing)
/mnt/dump2/slot3.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 3a8a9c7d:8711e931:3b64eee5:fd9461c9

    Update Time : Sat Sep  1 21:56:49 2018
       Checksum : ae71ed02 - correct
         Events : 2357041

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : ..AA. ('A' == active, '.' == missing)
/mnt/dump2/slot4.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 31f3790a:85db548a:a84d2754:c75854e8

    Update Time : Sun Sep  2 06:38:53 2018
       Checksum : 20e8478a - correct
         Events : 2357044

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : ...A. ('A' == active, '.' == missing)
/mnt/dump2/slot5.img:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 76ec0964:7491b265:25110f4d:81d88cc3
           Name : NAS:2
  Creation Time : Sat Jan 14 16:49:14 2012
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 1944080833 (927.01 GiB 995.37 GB)
     Array Size : 7776322048 (3708.04 GiB 3981.48 GB)
  Used Dev Size : 1944080512 (927.01 GiB 995.37 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : e046df58:28bb1715:160ed2d5:6e2aae94

    Update Time : Fri Aug 24 22:00:11 2018
       Checksum : 2810ff0a - correct
         Events : 2354905

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 4
   Array State : AAAAA ('A' == active, '.' == missing)







[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux