Re: Revive a dead md raid5 array

Jogchum Reitsma <jogchum.reitsma@xxxxxxxxx> · Tue, 27 Nov 2018 11:07:52 +0100

Hi WOL (and maybe some others reading this)
Sorry for the delay in responding, but this is getting out of my depth
:-( But yes, I think putting sdf in there should work. I don't know how
to tell mdadm that sdd should be active.

If you've got all the drives backed up, then fine. Actually, I think the
best bet from here would be to read up on overlays - it's in the wiki -
and overlay sda, sdb, sdf before re-assembling the array. That way,
you're not actually going to write to the array so if it goes
pear-shaped you haven't done any damage, and if it works you can just
tear down the overlay (reboot, say), and then assemble for real.

Cheers,
Wol

I read some info about overlays - for example on the Archlinux wiki - 
and of course on the raid wiki. But I' m not sure if my understanding of 
the matter is enough to get going.

The explanation on the raid-wiki uses parallel, and apparently, reading 
the first few lines of it' s man-pages, parallel itself is rather 
complicated. That makes understanding the code presented in the wiki not 
easy, at least not for me.

I have essentially two questions here:

- in the wiki two functions are defined under the heading "Overlay 
manipulation functions". Should I do something with these functions? I 
don' t see (but that might well be my own shortcoming) where these 
functions are called in the " parallel" code snippets

- the overlays are files, which should be at least 1% of the size of the 
disks going to be overlaid. In my case, that adds up to 160GB. I don' t 
see (but that might well be .... ) a directory- or filename in the 
examples given. So where are these files placed? If on the root file 
system, I have a problem, because I have only some 25GB available there.

Aside from all this, I still have some questions about what I see is the 
state of my array. Here I hope some experts reading this can be of some 
help...

- why is a drive, once kicked out of the array, declared a spare by mdadm?
- is there a way of resetting that?
- maybe the most puzzling: how is it possible that a drive (in my case 
/dev/sdf) which has never been kicked out of the array, and which 
according to smartctl is healthy (Raw_Read_Error_Rate = 0), has an event 
count 859 lower that all the other drives? It's 1272457 vs 273316...
- suppose I get the array up and running again, how are the chances of 
data corruption, and if above zero (what I think it will be) is there a 
way to see which files are corrupt?

Cheers, and thanks!

Jogchum