Re: Help restoring a raid10 Array (4 disk + one spare) after a hard disk failure at power on

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi !

I just did a test for you (and for being sure myself for future, and before saying anything wrong).

Created 4 files of 1536 MB (into a 8GB RAM Disk so it's faster than ever) + 1 for the coming replacement:

dd if=/dev/zero of=R1.img bs=1024k status=progress count=1536
dd if=/dev/zero of=R2.img bs=1024k status=progress count=1536
dd if=/dev/zero of=R3.img bs=1024k status=progress count=1536
dd if=/dev/zero of=R4.img bs=1024k status=progress count=1536
dd if=/dev/zero of=R5.img bs=1024k status=progress count=1536

Made them available as block devices:

losetup /dev/loop0 /home/user/RAMFS/R1.img
losetup /dev/loop1 /home/user/RAMFS/R2.img
losetup /dev/loop2 /home/user/RAMFS/R3.img
losetup /dev/loop3 /home/user/RAMFS/R4.img
losetup /dev/loop4 /home/user/RAMFS/R5.img

Created my RAID 10 with 4 disks:
mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3

Which gives:

oot@Octocrobe:/home/user/RAMFS# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue May 14 17:59:28 2019
     Raid Level : raid10
     Array Size : 3143680 (3.00 GiB 3.22 GB)
  Used Dev Size : 1571840 (1535.00 MiB 1609.56 MB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue May 14 18:05:55 2019
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

           Name : Octocrobe:0  (local to host Octocrobe)
           UUID : f9876be6:2a574cf3:3824348e:2438479e
         Events : 17

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync set-A   /dev/loop0
       1       7        1        1      active sync set-B   /dev/loop1
       2       7        2        2      active sync set-A   /dev/loop2
       3       7        3        3      active sync set-B   /dev/loop3

I wrote something to the RAID (do not do this over valuable filesystem or data!) in order to have a way to control data integrity later.
shred -n 1 -v /dev/md0

Checked the sha1 resulting values of data available on the array, then members data area (with 2048 sectors data offset, so 1 MB to skip)
887db7d3f046242e9f99dd330cc628d2b3f7a5f9  /dev/md0

dd if=/home/user/RAMFS/R1.img bs=1024k skip=1 | sha1sum: 954497d3e591bdb3998e5ccf35a639d2c2894bb8 dd if=/home/user/RAMFS/R2.img bs=1024k skip=1 | sha1sum: 954497d3e591bdb3998e5ccf35a639d2c2894bb8 dd if=/home/user/RAMFS/R3.img bs=1024k skip=1 | sha1sum: 1fad602e67a218391406dafa04c7ecba000ecaf7 dd if=/home/user/RAMFS/R4.img bs=1024k skip=1 | sha1sum: 1fad602e67a218391406dafa04c7ecba000ecaf7

For the still not used one: 23543e573e0e1fbfcec24dc7441395c50dd61bd6

Failed the 3rd member:
mdadm --manage /dev/md0 --fail /dev/loop2
Details :

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync set-A   /dev/loop0
       1       7        1        1      active sync set-B   /dev/loop1
       -       0        0        2      removed
       3       7        3        3      active sync set-B   /dev/loop3

       2       7        2        -      faulty   /dev/loop2

Not mandatory: trying to stop and assemble back the array, just to see:
mdadm --stop /dev/md0
mdadm --assemble /dev/md0 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
mdadm: /dev/md0 has been started with 3 drives (out of 4).

Details: /dev/loop2 disappeared from mdadm --detail, but it's still working the exact same way.
    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync set-A   /dev/loop0
       1       7        1        1      active sync set-B   /dev/loop1
       -       0        0        2      removed
       3       7        3        3      active sync set-B   /dev/loop3

Added the new one:

mdadm --manage /dev/md0 --add /dev/loop4

Details:

 Rebuild Status : 78% complete

           Name : Octocrobe:0  (local to host Octocrobe)
           UUID : b81ed231:11636921:fb6e5c51:7003143d
         Events : 36

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync set-A   /dev/loop0
       1       7        1        1      active sync set-B   /dev/loop1
       4       7        4        2      spare rebuilding   /dev/loop4
       3       7        3        3      active sync set-B   /dev/loop3

Then it seems to be fine:
dd if=/home/user/RAMFS/R5.img bs=1024k skip=1 | sha1sum : 1fad602e67a218391406dafa04c7ecba000ecaf7

root@Octocrobe:/home/user/RAMFS# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Tue May 14 17:59:28 2019
     Raid Level : raid10
     Array Size : 3143680 (3.00 GiB 3.22 GB)
  Used Dev Size : 1571840 (1535.00 MiB 1609.56 MB)
   Raid Devices : 4
  Total Devices : 5
    Persistence : Superblock is persistent

    Update Time : Tue May 14 18:29:01 2019
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

           Name : Octocrobe:0  (local to host Octocrobe)
           UUID : f9876be6:2a574cf3:3824348e:2438479e
         Events : 37

    Number   Major   Minor   RaidDevice State
       0       7        0        0      active sync set-A   /dev/loop0
       1       7        1        1      active sync set-B   /dev/loop1
       4       7        4        2      active sync set-A   /dev/loop4
       3       7        3        3      active sync set-B   /dev/loop3

Another way is:

mdadm /dev/md0 --re-add /dev/yourFailedDrive
mdadm --manage /dev/md0 --add /dev/yourNewDrive --replace /dev/yourFailedDrive --with /dev/yourNewDrive

At the end, the new drive content is conform too. This way is supposed to attempt reading over the replaced drive instead of reading over the others ones, but I guess your failed drive won't be available anymore ;) so it may be not useful.

Good luck!
Julien



On 5/14/19 5:48 PM, Eric Valette wrote:
I have a dedicated hardware nas that runs a self maintained debian 10.

before the hardware disk problem (before/after)

sda : system disk OK/OK no raid
sdb : first disk of the raid10 array OK/OK
sdc : second disk of the raid10 array OK/OK
sdd : third disk of the raid10 array OK/KO
sde : fourth disk of the raid10 array OK/OK but is now sdd
sdf : spare disk for the array is now sde

After the failure the BIOS does not detect the original third disk. Disk are renamed and I think sde has become sdd and sdf -> sde

Below are more detailed info. Feel free to ask for other things as I can log into the machine via ssh

So I have several questions :

    1) How to I repair the raid10 array using the spare disk without replacing the faulty one immediately?
     2) What should I do once I receive the new disk (hopefully soon)
     3) Is there a way to use persistent naming for disk array?

Sorry to annoy you but my kid wants to see a film on the nas and annoys me badly. And I prefer to ask rather than doing mistakes.

Thanks for any



mdadm --examine /dev/sdb
/dev/sdb:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
root@nas2:~# mdadm --examine /dev/sdb
/dev/sdb:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
root@nas2:~# mdadm --examine /dev/sdb1
/dev/sdb1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 6abe1f20:90c629de:fadd8dc0:ca14c928
            Name : nas2:0  (local to host nas2)
   Creation Time : Wed Jun 20 23:56:59 2012
      Raid Level : raid10
    Raid Devices : 4

  Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB)
      Array Size : 5860268032 (5588.79 GiB 6000.91 GB)
   Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262064 sectors, after=911 sectors
           State : clean
     Device UUID : ce9d878a:37a4f3a3:936bd905:c4ed9970

     Update Time : Wed May  8 11:39:40 2019
        Checksum : cf841c9f - correct
          Events : 1193

          Layout : near=2
      Chunk Size : 512K

    Device Role : Active device 0
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@nas2:~# mdadm --examine /dev/sdc
/dev/sdc:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
root@nas2:~# mdadm --examine /dev/sdc1
/dev/sdc1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 6abe1f20:90c629de:fadd8dc0:ca14c928
            Name : nas2:0  (local to host nas2)
   Creation Time : Wed Jun 20 23:56:59 2012
      Raid Level : raid10
    Raid Devices : 4

  Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB)
      Array Size : 5860268032 (5588.79 GiB 6000.91 GB)
   Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262064 sectors, after=911 sectors
           State : clean
     Device UUID : 8c89bdf8:4f3f8ace:c15b5634:7a874071

     Update Time : Wed May  8 11:39:40 2019
        Checksum : 97744edb - correct
          Events : 1193

          Layout : near=2
      Chunk Size : 512K

    Device Role : Active device 1
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@nas2:~# mdadm --examine /dev/sdd
/dev/sdd:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
root@nas2:~# mdadm --examine /dev/sdd1
/dev/sdd1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 6abe1f20:90c629de:fadd8dc0:ca14c928
            Name : nas2:0  (local to host nas2)
   Creation Time : Wed Jun 20 23:56:59 2012
      Raid Level : raid10
    Raid Devices : 4

  Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB)
      Array Size : 5860268032 (5588.79 GiB 6000.91 GB)
   Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262064 sectors, after=911 sectors
           State : clean
     Device UUID : c97b767a:84d2e7e2:52557d30:51c39784

     Update Time : Wed May  8 11:39:40 2019
        Checksum : 3d08e837 - correct
          Events : 1193

          Layout : near=2
      Chunk Size : 512K

    Device Role : Active device 3
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@nas2:~# mdadm --examine /dev/sde
/dev/sde:
    MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
root@nas2:~# mdadm --examine /dev/sde1
/dev/sde1:
           Magic : a92b4efc
         Version : 1.2
     Feature Map : 0x0
      Array UUID : 6abe1f20:90c629de:fadd8dc0:ca14c928
            Name : nas2:0  (local to host nas2)
   Creation Time : Wed Jun 20 23:56:59 2012
      Raid Level : raid10
    Raid Devices : 4

  Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB)
      Array Size : 5860268032 (5588.79 GiB 6000.91 GB)
   Used Dev Size : 5860268032 (2794.39 GiB 3000.46 GB)
     Data Offset : 262144 sectors
    Super Offset : 8 sectors
    Unused Space : before=262064 sectors, after=911 sectors
           State : clean
     Device UUID : 82667e81:a6158319:85e0282e:845eec1c

     Update Time : Wed May  8 11:00:29 2019
        Checksum : 10ac3349 - correct
          Events : 1193

          Layout : near=2
      Chunk Size : 512K

    Device Role : spare
    Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
root@nas2:~#

mdadm --detail /dev/md0
/dev/md0:
            Version : 1.2
         Raid Level : raid0
      Total Devices : 4
        Persistence : Superblock is persistent

              State : inactive
    Working Devices : 4

               Name : nas2:0  (local to host nas2)
               UUID : 6abe1f20:90c629de:fadd8dc0:ca14c928
             Events : 1193

     Number   Major   Minor   RaidDevice

        -       8       65        -        /dev/sde1
        -       8       49        -        /dev/sdd1
        -       8       33        -        /dev/sdc1
        -       8       17        -        /dev/sdb1

cat /proc/mdstat
Personalities : [raid10]
md0 : inactive sdc1[1](S) sdb1[0](S) sde1[4](S) sdd1[3](S)
       11720537886 blocks super 1.2

unused devices: <none>




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux