R: Data recovery after the failure of two disks of 4

Carabetta Giulio <g.carabetta@xxxxxx> · Wed, 12 Sep 2012 09:59:29 +0200

Thanks a lot, Neil.

I'll update you: I'm waiting for warranty replacement...

Giulio Carabetta 
Ufficio Sistemi Informativi
Tel. +39 066767733
Cell. 3489025709
Fax. 0667678028
g.carabetta@xxxxxx
www.abi.it

 Logo

 AmbientePrima di stampare questa mail, pensa all'ambiente ** Think about the environment before printing 
Il contenuto e gli allegati di questo messaggio sono strettamente confidenziali, e ne sono vietati la diffusione e l'uso non autorizzato.
Le opinioni eventualmente espresse sono quelle dell'autore: pertanto il messaggio non costituisce impegno contrattuale tra l'ABI e il destinatario
e l'Associazione non assume alcuna responsabilita' riguardo ai contenuti del testo e dei relativi allegati, ne' per eventuali intercettazioni,
modifiche o danneggiamenti. Se questo messaggio le fosse pervenuto per errore,la preghiamo di distruggerlo e comunicarne l'errata ricezione a postmaster@xxxxxx. 
________________________________

-----Messaggio originale-----
Da: NeilBrown [mailto:neilb@xxxxxxx] 
Inviato: martedì 11 settembre 2012 3.03
A: Carabetta Giulio
Cc: 'linux-raid@xxxxxxxxxxxxxxx'
Oggetto: Re: Data recovery after the failure of two disks of 4

On Wed, 5 Sep 2012 15:34:00 +0200 Carabetta Giulio <g.carabetta@xxxxxx> wrote:

> I'm trying to retrieve a raid 5 array after the failure of two disks of 4.
> "Simply", the controller has lost a disk, and after a couple of minutes, it lost another.
> The disappearance of the disk also happened to me while I was trying to pull out the data from the disk, so I guess it should be a problem with the control board of the disks...
> 
> However, the server at the time of the fault was not doing anything special, so the data "critics"  are still there, on the surface of the disk ...
> 
> Anyhow, I have two good disks and two faults.
> 
> More specifically, the disks (4 identical 2TB WD20EARS) are all partitioned in the same way: the first partition, about 250mb, the second with the rest of the free space.
> - sda1 and sdb1 as md0 (raid1) with /boot
> - sdc1 and sdd1 as md2 (raid1) with swaps
> - sd[abcd]2 as md1 (RAID5) with root partition.
> 
> Swap is not a matter, and boot array has no problem. The first time I found the problem it didn't boot just because the bios did not see the disks (both with boot partition...), but was temporary error...
> 
> The first disk to fail was sdb, and the second was sda: I'm guessing 
> by looking at the differences between the superblocks: (the full dump 
> of superblocks is queued to the message)
> 
> ---
> sda2:
>         Update Time: Mon Aug 27 20:46:05 2012
>              Events: 622
>        Array State: A.AA ('A' == active, '.' == Missing)
> 
> sdb2:
>         Update Time: Mon Aug 27 20:44:22 2012
>              Events: 600
>        Array State: AAAA ('A' == active, '.' == Missing)
> 
> SdC2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> 
> sdd2:
>         Update Time: Mon Aug 27 20:46:33 2012
>              Events: 625
>        Array State: ..AA ('A' == active, '.' == Missing)
> ---
> 
> Now I'm copying partitions elsewhere, with ddrescue, to replace the faulty disks and rebuild everything.
> 
> In the meantime, I did a first test on the array md1 (root partition, 
> the one with all my data...)
> 
> Trying to reassemble the array I got:
> 
> # Mdadm --assemble --force --verbose /dev/md11 /dev/sda2 /dev/sdb2 
> /dev/sdc2 /dev/sdd2
> mdadm: forcing event count in /dev/sda2(o) from 622 upto 625
> mdadm: Marking array /dev/md11 as 'clean'
> mdadm: added /dev/sdb2 to /dev/md11 as 1 (possibly out of date)
> mdadm: /dev/md11 has been started with 3 drives (out of 4).
> 
> 
> Then I mounted the array and I saw the correct file system.
> To avoid a new fault (with disks very unstable), I stopped and removed the array very quickly, so I didn't tryed to read a file, I simply did few ls...

Use --assemble --force is the correct thing to do.  It gives you the best chance of getting all your data.
If you don't trust the drives, you should get replacements and use ddrescue to copy the data from the bad device to the new device.  Then assemble the array using the new device.

> 
> Now the question.
> 
> I was copying only 3 disks, sdd, sdc, and the "freshest" faulty: sda. With 3 out of 4 disks in raid5 should be sufficient...
> But while copying the data, I got a read error on sda. I lost just 4Kbyte, but I do not know what piece of data is part of what...

You might be lucky and it is a block that isn't used.  You might be unlucky and it is some critical data.  There isn't a lot you can do about that though
- the data appears to be gone.

> 
> So now I'm ddrescue'ing the fourth disk.
> 
> And then what?
> 
> While I wait for the replacement disks (luckily under warranty, at least that ...), I need some suggestions.
> 
> I supposed to copy the images on the new disk, and then try to assemble the array, but not know what could be the best approach (and if there's another one over a simple "mdadm --assemble").

Yes, just copy from bad disk to good disk with ddrescue, then assemble with mdadm.

NeilBrown

> 
> Keeping hold sdc and sdd as they are intact (at the moment ...): on the one hand we have a data disk "old" (sdb, the first to break ...) but without surface errors, and on the other hand, we have the other disk with the newest data (sda, the last to break), but with a 4k hole.
> Moreover sda has been forced as "good"...
> 
> Which options I have?
> 
> Thanks
> 
> Giulio Carabetta
> 
> ===================================================
>     root@PartedMagic:/mnt# mdadm --examine /dev/sda2
>     /dev/sda2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : active
>         Device UUID : 3d01cfa9:6313d51c:402b3ca5:815a84e9
>      
>         Update Time : Mon Aug 27 20:46:05 2012
>            Checksum : c51fe8dc - correct
>              Events : 622
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 0
>        Array State : A.AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdb2
>     /dev/sdb2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0c64fdf8:c55ee450:01f05a3c:57b87308
>      
>         Update Time : Mon Aug 27 20:44:22 2012
>            Checksum : fe6eb926 - correct
>              Events : 600
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 1
>        Array State : AAAA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdc2
>     /dev/sdc2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 0bb6c440:a2e47ae9:50eee929:fee9fa5e
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 22e0c195 - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 2
>        Array State : ..AA ('A' == active, '.' == missing)
>      
>      
>     root@PartedMagic:/mnt# mdadm --examine /dev/sdd2
>     /dev/sdd2:
>               Magic : a92b4efc
>             Version : 1.2
>         Feature Map : 0x0
>          Array UUID : 4e7bb63f:74d1ac58:b01b1b48:44c7b7d7
>                Name : ubuntu:0
>       Creation Time : Sun Sep 25 09:10:23 2011
>          Raid Level : raid5
>        Raid Devices : 4
>      
>      Avail Dev Size : 3906539520 (1862.78 GiB 2000.15 GB)
>          Array Size : 5859807744 (5588.35 GiB 6000.44 GB)
>       Used Dev Size : 3906538496 (1862.78 GiB 2000.15 GB)
>         Data Offset : 2048 sectors
>        Super Offset : 8 sectors
>               State : clean
>         Device UUID : 1f06610d:379589ed:db2a719b:82419b35
>      
>         Update Time : Mon Aug 27 20:46:33 2012
>            Checksum : 3bb3564f - correct
>              Events : 625
>      
>              Layout : left-symmetric
>          Chunk Size : 512K
>      
>        Device Role : Active device 3
>        Array State : ..AA ('A' == active, '.' == missing)
> 
> ===================================================--
> To unsubscribe from this list: send the line "unsubscribe linux-raid" 
> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html