Re: Revive a dead md raid5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/11/2018 12:12, Jogchum Reitsma wrote:

Okay. It's your choice. I think your best option is - having fixed the time-out problem - to try force-assembling the array using the Blues, and then do a repair. Then if everything looks good swap the Reds in using --replace, retiring the Blues.

If you want to just assemble the Reds into a new array and leave the Blues as a backup, you're likely to end up with silent corruption. Those blocks that didn't copy will be corrupt, with no way to identify them. At least if you try to recover the Blues, you're more likely to trip over read errors and find out what's been corrupted - or better the raid recovery will kick in and repair your data (if checking the blues hits a read error it will try and recover - if you use the reds there will be no read error, and no attempt to recover the data).

Ah, clear.

So this is what I think I should do (leaving the blues in place for now):

    for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done

    mdadm - - stop /dev/md0

    mdadm - - assemble - - force /dev/md0 /dev/sda /dev/sdb /dev/sdd

    mdadm /dev/md0 - - add /dev/sdf

Is there a --re-add option? If the re-add fails it will just do a normal add, but if you've got a bitmap the re-add should just update sdf and do the missing writes.

    mdadm - - run /dev/md0

I think this will happen automatically with the forced assemble. iirc, assemble does an implicit run. It's when this implicit run fails it leaves the array in a "not working" state so it has to be stopped before you can do anything else with it.

    echo repair > /sys/block/md0/md/sync_action

What puzzles me still a bit, though it's more curiosity, is that it is /dev/sdf which has the deviant event count, while this drive is one of the two that was NOT kicked out of the array.

I'm guessing that because two drives had read errors, the array just fell over leaving a third drive (sdf) not updated. What normally happens is that one drive gets kicked, so its count falls behind, then a second drive gets kicked later to bring the array down. That can't have happened here. I agree it's a bit weird.

When the array is up and running healty again, first thing is of course update my oldest full backup, then retire one by one the disks with read errors and replace them by the WD Red ones I already have. Question here is, will mdadm be confused by the fact that these Red disks bear copies of the faulty blue ones?

If you use the --replace option, I doubt it. To be on the safe side, I think there's a --wipe-superblock or similar option that you can use to clean it out. Or, seeing as you're using raw disks, I guess "dd if=/dev/zero of=/dev/sdx bs=1024 count=10" would do it - a 1.2 superblock is stored 4K into the block device so wiping the first 10K will overwrite it.

After that, send the faulty disks to the supplier for guarantee, buy two new WD red disks, and retire one by one the blue ones still in the array.

As I said, they may say "sorry, it's within spec, no dice". I don't know how much those disks have been hammered, but 1 error per 10G read is the guarantee. On a 4G disk, that's reading from end to end 2.5 times. (And yes, drives *should* be a lot better than that, but that's what the manufacturers guarantee.)

However, given that Reds are probably more expensive,they may be happy to upgrade them for you for the cost difference.

During this adventure, it came to my mind to alter the array level to raid6, but with disks supporting  SCT/ERC it seems less necessary to me. It would need another 4TB disk to keep the net capacity of the array.

Bear in mind that raid-5 cannot recover from corruption, while raid-6 can. If a disk fails raid-5 can rebuild it. I'd seriously look at raid-6. Provided you make sure the timeout on the blues is updated every boot, they should be fine in your array, and it's worth going to raid-6. Just *make* *sure* you monitor for a disk failure!

Cheers,
Wol



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux