recreating a non-damaged RAID5

Somewhere ToHide <andre_39145@xxxxxxxxxx> · Tue, 17 Jul 2012 11:41:57 +0000

Hi all,

I have some trouble in recreating a likely undamaged RAID5-Array with 4 disks.
Initially I created the array one year ago:

mdadm --create /dev/md0 --level=5 --chunk=128 --raid-devices=4 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

/dev/md0 is base for a luks container which is used as physical volume for LVM. 

Recently I moved the array to a new machine, all working fine. Then I have
experimented with hdparm to put drives into sleep-mode. After a few 
sleep/wake-cycles all still working fine. Sunday an event occured that 2 of
the 4 disks dind't wake up properly. I waked up the array by issuing this 
command: 
    dd if=/dev/zero bs=512 count=1 > mnt/tmp/test.img
(mnt/tmp resides in the array/luks/lvm)
I cannot remember if an error message occured, but I see that 2 drives aren't
accessible. I decided to restart the machine to get drives working. After
restart all 4 drives are working, but the array didn't came up because of 2
missing drives. After some googling I found 
    https://raid.wiki.kernel.org/index.php/RAID_Recovery
This guide seems to deal with my problem. I tried starting recreating my array
with this steps:

1. mdadm --examine /dev/sd[acde]1 > raid.status

   raid.status contains this:

      /dev/sda1:                                          
              Magic : a92b4efc                            
            Version : 1.2                                 
        Feature Map : 0x0                                 
         Array UUID : 65089aca:483766d3:0db55271:27c73384 
               Name : oldman:0                            
      Creation Time : Sun Jun 26 14:35:51 2011            
         Raid Level : raid5                               
       Raid Devices : 4                                   

     Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) 
         Array Size : 5860536576 (5589.04 GiB 6001.19 GB) 
      Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB) 
        Data Offset : 2048 sectors                        
       Super Offset : 8 sectors                           
              State : clean                               
        Device UUID : 3d7a888c:7488a859:f53a0fe5:5bbc1bda 

        Update Time : Sat Jul 14 22:08:08 2012            
           Checksum : eeb6d1f1 - correct                  
             Events : 266                                 

             Layout : left-symmetric                      
         Chunk Size : 128K                                

       Device Role : Active device 3                      
       Array State : AAAA ('A' == active, '.' == missing) 
    /dev/sdc1:                                            
              Magic : a92b4efc                            
            Version : 1.2                                 
        Feature Map : 0x0                                 
         Array UUID : 65089aca:483766d3:0db55271:27c73384 
               Name : oldman:0                            
      Creation Time : Sun Jun 26 14:35:51 2011            
         Raid Level : raid5                               
       Raid Devices : 4                                   

     Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) 
         Array Size : 5860536576 (5589.04 GiB 6001.19 GB) 
      Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB) 
        Data Offset : 2048 sectors                        
       Super Offset : 8 sectors                           
              State : clean                               
        Device UUID : 578cb781:387a3537:e42e9414:3d6e25e7 

        Update Time : Sat Jul 14 22:08:08 2012            
           Checksum : fe017b9c - correct                  
             Events : 266                                 

             Layout : left-symmetric                      
         Chunk Size : 128K                                

       Device Role : Active device 1                      
       Array State : AAAA ('A' == active, '.' == missing) 
    /dev/sdd1:                                            
              Magic : a92b4efc                            
            Version : 1.2                                 
        Feature Map : 0x0                                 
         Array UUID : 65089aca:483766d3:0db55271:27c73384 
               Name : oldman:0                            
      Creation Time : Sun Jun 26 14:35:51 2011            
         Raid Level : raid5                               
       Raid Devices : 4                                   

     Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) 
         Array Size : 5860536576 (5589.04 GiB 6001.19 GB) 
      Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB) 
        Data Offset : 2048 sectors                        
       Super Offset : 8 sectors                           
              State : clean                               
        Device UUID : 3dc3e9e2:ab9a2199:5618c6cd:ca0631fa 

        Update Time : Sun Jul 15 11:25:20 2012            
           Checksum : 8d5c0fd0 - correct                  
             Events : 271                                 

             Layout : left-symmetric                      
         Chunk Size : 128K                                

       Device Role : Active device 2                      
       Array State : A.A. ('A' == active, '.' == missing) 
    /dev/sde1:                                            
              Magic : a92b4efc                            
            Version : 1.2                                 
        Feature Map : 0x0                                 
         Array UUID : 65089aca:483766d3:0db55271:27c73384 
               Name : oldman:0                            
      Creation Time : Sun Jun 26 14:35:51 2011            
         Raid Level : raid5                               
       Raid Devices : 4                                   

     Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) 
         Array Size : 5860536576 (5589.04 GiB 6001.19 GB) 
      Used Dev Size : 3907024384 (1863.01 GiB 2000.40 GB) 
        Data Offset : 2048 sectors                        
       Super Offset : 8 sectors                           
              State : clean                               
        Device UUID : 22ecbee2:a26678a3:c2aeca18:007edd48 

        Update Time : Sun Jul 15 11:25:20 2012            
           Checksum : 3139124b - correct                  
             Events : 271                                 

             Layout : left-symmetric                      
         Chunk Size : 128K                                

       Device Role : Active device 0                      
       Array State : A.A. ('A' == active, '.' == missing) 

2. grep Role raid.status

  gives:
   Device Role : Active device 3
   Device Role : Active device 1
   Device Role : Active device 2
   Device Role : Active device 0

3. mdadm --create --assume-clean --level=5 --chunk=128 --raid-devices=4 /dev/md0 /dev/sde1 /dev/sdc1 /dev/sdd1 /dev/sda1

  THis gives the message that all drives apears to be part of a raid array.
  I confirmed "Continue creating array" and /dev/md0 was created. But data
  in /dev/md0 is corrupted. I cannot luksOpen the device. The first bytes 
  of /dev/md0 did not contain the luks header. Something went wrong in
  recreating the array. I've build a simple script that tries to create the
  array and luksOpen it in all possible 24 permutations in disk-arrangement,
  but none of them brings me up my data.

I'm pretty sure that I didn't write something to the array, except the 512
bytes to wake up it from sleep (see above). The kernel messages around the
time drive didn't respond are:

  Jul 15 11:21:12 EBENE kernel: [30046.863193] sd 2:0:0:0: [sdc] Unhandled error code
  Jul 15 11:21:12 EBENE kernel: [30046.863197] sd 2:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:21:12 EBENE kernel: [30046.863201] sd 2:0:0:0: [sdc] CDB: Read(10): 28 00 2f 91 75 00 00 00 a0 00
  Jul 15 11:21:12 EBENE kernel: [30046.863374] sd 0:0:0:0: [sda] Unhandled error code
  Jul 15 11:21:12 EBENE kernel: [30046.863376] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:21:12 EBENE kernel: [30046.863378] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 2f 91 75 00 00 00 a0 00
  Jul 15 11:21:12 EBENE kernel: [30046.864442] sd 0:0:0:0: [sda] Unhandled error code
  Jul 15 11:21:12 EBENE kernel: [30046.864445] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:21:12 EBENE kernel: [30046.864449] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 2f 91 76 00 00 00 a0 00
  Jul 15 11:21:12 EBENE kernel: [30046.864508] sd 2:0:0:0: [sdc] Unhandled error code
  Jul 15 11:21:12 EBENE kernel: [30046.864510] sd 2:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:21:12 EBENE kernel: [30046.864513] sd 2:0:0:0: [sdc] CDB: Read(10): 28 00 2f 91 75 a0 00 00 60 00
  Jul 15 11:21:12 EBENE kernel: [30046.864638] sd 0:0:0:0: [sda] Unhandled error code
  Jul 15 11:21:12 EBENE kernel: [30046.864640] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:21:12 EBENE kernel: [30046.864642] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 2f 91 75 a0 00 00 60 00
  Jul 15 11:21:12 EBENE kernel: [30046.864680] sd 2:0:0:0: [sdc] Unhandled error code
  Jul 15 11:21:12 EBENE kernel: [30046.864681] sd 2:0:0:0: [sdc]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:21:12 EBENE kernel: [30046.864683] sd 2:0:0:0: [sdc] CDB: Read(10): 28 00 2f 91 76 00 00 00 a0 00
  Jul 15 11:23:38 EBENE kernel: [30192.746530] sd 0:0:0:0: [sda] Unhandled error code
  Jul 15 11:23:38 EBENE kernel: [30192.746534] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:23:38 EBENE kernel: [30192.746539] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 20 00
  Jul 15 11:23:38 EBENE kernel: [30192.746810] sd 0:0:0:0: [sda] Unhandled error code
  Jul 15 11:23:38 EBENE kernel: [30192.746812] sd 0:0:0:0: [sda]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
  Jul 15 11:23:38 EBENE kernel: [30192.746815] sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00

I have modified the Permute_array.pl on the wiki page to not mount /dev/md0
but print out the first 4 bytes of /dev/md0. These must be "LUKS", if array
is recreated in a succesful way. But I didn't ge this result after all
permutations.

I'm afraid of having killed my array by not using "missing" on some drive in
the first try (see above Point 3). Will may array remain corrupted?
Any suggestions?

Thanks in advance
André

 		 	   		  --
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html