On 21/11/2018 12:12, Jogchum Reitsma wrote:
Okay. It's your choice. I think your best option is - having fixed the
time-out problem - to try force-assembling the array using the Blues,
and then do a repair. Then if everything looks good swap the Reds in
using --replace, retiring the Blues.
If you want to just assemble the Reds into a new array and leave the
Blues as a backup, you're likely to end up with silent corruption.
Those blocks that didn't copy will be corrupt, with no way to identify
them. At least if you try to recover the Blues, you're more likely to
trip over read errors and find out what's been corrupted - or better
the raid recovery will kick in and repair your data (if checking the
blues hits a read error it will try and recover - if you use the reds
there will be no read error, and no attempt to recover the data).
Ah, clear.
So this is what I think I should do (leaving the blues in place for now):
for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done
mdadm - - stop /dev/md0
mdadm - - assemble - - force /dev/md0 /dev/sda /dev/sdb /dev/sdd
mdadm /dev/md0 - - add /dev/sdf
Is there a --re-add option? If the re-add fails it will just do a normal
add, but if you've got a bitmap the re-add should just update sdf and do
the missing writes.
mdadm - - run /dev/md0
I think this will happen automatically with the forced assemble. iirc,
assemble does an implicit run. It's when this implicit run fails it
leaves the array in a "not working" state so it has to be stopped before
you can do anything else with it.
echo repair > /sys/block/md0/md/sync_action
What puzzles me still a bit, though it's more curiosity, is that it is
/dev/sdf which has the deviant event count, while this drive is one of
the two that was NOT kicked out of the array.
I'm guessing that because two drives had read errors, the array just
fell over leaving a third drive (sdf) not updated. What normally happens
is that one drive gets kicked, so its count falls behind, then a second
drive gets kicked later to bring the array down. That can't have
happened here. I agree it's a bit weird.
When the array is up and running healty again, first thing is of course
update my oldest full backup, then retire one by one the disks with read
errors and replace them by the WD Red ones I already have.
Question here is, will mdadm be confused by the fact that these Red
disks bear copies of the faulty blue ones?
If you use the --replace option, I doubt it. To be on the safe side, I
think there's a --wipe-superblock or similar option that you can use to
clean it out. Or, seeing as you're using raw disks, I guess "dd
if=/dev/zero of=/dev/sdx bs=1024 count=10" would do it - a 1.2
superblock is stored 4K into the block device so wiping the first 10K
will overwrite it.
After that, send the faulty disks to the supplier for guarantee, buy two
new WD red disks, and retire one by one the blue ones still in the array.
As I said, they may say "sorry, it's within spec, no dice". I don't know
how much those disks have been hammered, but 1 error per 10G read is the
guarantee. On a 4G disk, that's reading from end to end 2.5 times. (And
yes, drives *should* be a lot better than that, but that's what the
manufacturers guarantee.)
However, given that Reds are probably more expensive,they may be happy
to upgrade them for you for the cost difference.
During this adventure, it came to my mind to alter the array level to
raid6, but with disks supporting SCT/ERC it seems less necessary to me.
It would need another 4TB disk to keep the net capacity of the array.
Bear in mind that raid-5 cannot recover from corruption, while raid-6
can. If a disk fails raid-5 can rebuild it. I'd seriously look at
raid-6. Provided you make sure the timeout on the blues is updated every
boot, they should be fine in your array, and it's worth going to raid-6.
Just *make* *sure* you monitor for a disk failure!
Cheers,
Wol