Re: Revive a dead md raid5 array

"Wol's lists" <antlists@xxxxxxxxxxxxxxx> · Wed, 21 Nov 2018 23:22:19 +0000

On 21/11/2018 12:12, Jogchum Reitsma wrote:

Okay. It's your choice. I think your best option is - having fixed the 
time-out problem - to try force-assembling the array using the Blues, 
and then do a repair. Then if everything looks good swap the Reds in 
using --replace, retiring the Blues.

If you want to just assemble the Reds into a new array and leave the 
Blues as a backup, you're likely to end up with silent corruption. 
Those blocks that didn't copy will be corrupt, with no way to identify 
them. At least if you try to recover the Blues, you're more likely to 
trip over read errors and find out what's been corrupted - or better 
the raid recovery will kick in and repair your data (if checking the 
blues hits a read error it will try and recover - if you use the reds 
there will be no read error, and no attempt to recover the data).

Ah, clear.

So this is what I think I should do (leaving the blues in place for now):

    for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done

    mdadm - - stop /dev/md0

    mdadm - - assemble - - force /dev/md0 /dev/sda /dev/sdb /dev/sdd

    mdadm /dev/md0 - - add /dev/sdf

Is there a --re-add option? If the re-add fails it will just do a normal 
add, but if you've got a bitmap the re-add should just update sdf and do 
the missing writes.

    mdadm - - run /dev/md0

I think this will happen automatically with the forced assemble. iirc, 
assemble does an implicit run. It's when this implicit run fails it 
leaves the array in a "not working" state so it has to be stopped before 
you can do anything else with it.

    echo repair > /sys/block/md0/md/sync_action

What puzzles me still a bit, though it's more curiosity, is that it is 
/dev/sdf which has the deviant event count, while this drive is one of 
the two that was NOT kicked out of the array.

I'm guessing that because two drives had read errors, the array just 
fell over leaving a third drive (sdf) not updated. What normally happens 
is that one drive gets kicked, so its count falls behind, then a second 
drive gets kicked later to bring the array down. That can't have 
happened here. I agree it's a bit weird.

When the array is up and running healty again, first thing is of course 
update my oldest full backup, then retire one by one the disks with read 
errors and replace them by the WD Red ones I already have.
Question here is, will mdadm be confused by the fact that these Red 
disks bear copies of the faulty blue ones?

If you use the --replace option, I doubt it. To be on the safe side, I 
think there's a --wipe-superblock or similar option that you can use to 
clean it out. Or, seeing as you're using raw disks, I guess "dd 
if=/dev/zero of=/dev/sdx bs=1024 count=10" would do it - a 1.2 
superblock is stored 4K into the block device so wiping the first 10K 
will overwrite it.

After that, send the faulty disks to the supplier for guarantee, buy two 
new WD red disks, and retire one by one the blue ones still in the array.

As I said, they may say "sorry, it's within spec, no dice". I don't know 
how much those disks have been hammered, but 1 error per 10G read is the 
guarantee. On a 4G disk, that's reading from end to end 2.5 times. (And 
yes, drives *should* be a lot better than that, but that's what the 
manufacturers guarantee.)

However, given that Reds are probably more expensive,they may be happy 
to upgrade them for you for the cost difference.

During this adventure, it came to my mind to alter the array level to 
raid6, but with disks supporting  SCT/ERC it seems less necessary to me. 
It would need another 4TB disk to keep the net capacity of the array.

Bear in mind that raid-5 cannot recover from corruption, while raid-6 
can. If a disk fails raid-5 can rebuild it. I'd seriously look at 
raid-6. Provided you make sure the timeout on the blues is updated every 
boot, they should be fine in your array, and it's worth going to raid-6. 
Just *make* *sure* you monitor for a disk failure!

Cheers,
Wol