On 19/11/2018 22:35, Jogchum Reitsma wrote:
Hi,
New to this list, I understand that I can post a problem with an md raid
array here. If I'm wrong in that supposition, please let me know, and
accept my apologies!
That's right. Here you get the experts, but they may take time
responding :-(
That said, though I wouldn't call myself an expert, I know enough to
think everything will be fine. How easy would it be for you to back up
the disk if you recover it read-only?
I have a 4-disk rad5 array, of which two of the disks are kicked out
because of read errors. The disks are WD Blue 4TB disks, which are still
under guarantee.
Just looked at the spec of those drives. It looks a bit worrisome to me.
take a look at the raid wiki
https://raid.wiki.kernel.org/index.php/Linux_Raid
What I would really like to see is whether these drives support SCT/ERC.
If they don't there is our first problem. I notice WD says raids 0 and
1, not 5 ...
I have reasonable recent backups, but yet I would like to try to get the
array alive again.
That shouldn't be hard.
Funny thing is, mdadm -- examine states the array as being raid0:
/dev/md0:
Version : 1.2
Raid Level : raid0
Total Devices : 4
Persistence : Superblock is persistent
This seems to be recent glitch in mdadm. Don't worry about it ...
=================================================================
Maybe you noticed the fact that all disks are marked as spare, and that
the event count of one of the disks /dev/sdf is different from the other's
I found some more occurrences of a raid5 being recognized as a raid0
device, but not a real solution to this.
The solution, iirc, was just to stop the array and re-assemble it - as
soon as the array was running, it sorted itself out.
=============================================================================================
The faulty disks are /dev/sda an /dev/sdd, and I copied the contents to
new WD RED 4TB disks, with
ddrescue -d -s <size-of-target-disk> -f /dev/sd<source> /dev/sd<target>
sd<source>.map
The size argument because the new disks are some 4MB smaller than the
original.
ddrescue saw one one disk 14, on the other 54 read errors, and copied
99.99% of the source.
Is the a way to revive the array, and if yes, how can I do that?
Firstly, if the blues don't support SCT/ERC, you NEED NEED NEED to fix
the timeout mismatch problem. I suspect that's what blew up your array.
Now because you've got a bunch of read errors, I suspect you're going to
lose some data, sorry. You have two choices.
1) Force assemble the array with all four drives, and run a repair. This
should fix your read errors, but risks losing data thanks to the event
counter mismatch.
2) Force assemble the array with the three good drives, then re-add the
fourth. If you've got bitmaps, and it re-adds cleanly, then run a repair
and you *might* get everything back. Otherwise it might just add, so it
will re-sync of its own accord, which will give you a clean array with
no errors but the read errors will cause data loss. Sorry.
Note that before you do any re-assembly, you need to do a "array stop"
otherwise pretty much anything you try will fail with "device busy".
Cheers,
Wol