Re: Revive a dead md raid5 array

Jogchum Reitsma <jogchum.reitsma@xxxxxxxxx> · Thu, 22 Nov 2018 11:56:54 +0100

Hi,

I'm afraid I'm not there yet.

On the command "

   mdadm - - assemble - - force /dev/md0 /dev/sda /dev/sdb /dev/sdd

mdadm responded

   mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to
   start the array.

Not surprising, the command (there is a re-add option as you suggested, 
so I used that)

   mdadm /dev/md0 - - re-add /dev/sdf

failed with

   mdadm:  Cannot get array info for /dev/md0

Indeed, after re-inspecting the info from mdadm - - examine it appears 
the the device role for /dev/sdd is Spare, while the role for /dev/sda, 
/dev/sdb, and /dev/sdf (sdf is the drive with the deviant event count) 
is Active.

What is the best action here? Take /dev/sdf in the initial assembly, in 
stead of /dev/sdd, and re-add /dev/sdd afterwards?
Or is there a way to tell mdadm that is should treat /dev/sdd as an 
active drive?

Cheers, Jogchum

Op 22-11-18 om 00:22 schreef Wol's lists:
On 21/11/2018 12:12, Jogchum Reitsma wrote:

Ah, clear.

So this is what I think I should do (leaving the blues in place for 
now):

    for x in /sys/block/*/device/timeout ; do echo 180 > $x ; done

    mdadm - - stop /dev/md0

    mdadm - - assemble - - force /dev/md0 /dev/sda /dev/sdb /dev/sdd

    mdadm /dev/md0 - - add /dev/sdf
    mdadm - - run /dev/md0

    echo repair > /sys/block/md0/md/sync_action

What puzzles me still a bit, though it's more curiosity, is that it 
is /dev/sdf which has the deviant event count, while this drive is 
one of the two that was NOT kicked out of the array.

I'm guessing that because two drives had read errors, the array just 
fell over leaving a third drive (sdf) not updated. What normally 
happens is that one drive gets kicked, so its count falls behind, then 
a second drive gets kicked later to bring the array down. That can't 
have happened here. I agree it's a bit weird.

Cheers,
Wol