Re: md_raid5 recovering failed need help

Phil Turmel <philip@xxxxxxxxxx> · Sat, 27 Dec 2014 10:24:22 -0500

Good morning David,

{ or Stephan ? }

On 12/25/2014 09:24 AM, Stephan Hafiz wrote:
> Hi! I’m from germany and my raid and me needs help.
> My english isn’t very good, but i think it’s sufficient. And i think, this mailinglist is my last hope ☺

This is the right place for problems with linux raid arrays.

> So on, …. Here ist my problem.
> The raid5 has lost 2 of 5 disks. First one disk and then the second one.

Ok.  Not uncommon.

[trim /]

> !SMART Status
> for i in a b c d e f; do echo Device  sd$i; smartctl -H /dev/sd$i | egrep overall; echo; done;
> Device sda
> SMART overall-health self-assessment test result: PASSED
> 
> Device sdb
> SMART overall-health self-assessment test result: PASSED
> 
> Device sdc
> SMART overall-health self-assessment test result: PASSED
> 
> Device sdd
> SMART overall-health self-assessment test result: PASSED
> 
> Device sde
> SMART overall-health self-assessment test result: PASSED
> 
> Device sdf
> SMART overall-health self-assessment test result: PASSED

It is extremely common to have an overall result of "PASSED" when you
aren't safe at all.  Please redo this without trimming, like so:

for x in /dev/sd[b-f] ; do echo $x ; smartctl -x $x ; done

Paste the result at the end of you next mail--no need to attach nor need
for pastebin services.

Also, if you still have any syslogs from the time of the failure, it
would be good to see the kernel messages that triggered the drive
ejections from the raid.

> !mdadm version
> mdadm - v3.2.5 - 18th May 2012
> I have read about recent versions 3.3.x @ raid.wiki.kernel.org, i haven’t tested this version.

It may be necessary.  You haven't reported your distro nor your kernel
version.

> !superblock informations
> Only the Events from sdb1 are off

[trim /]

Very good report!  You've saved all the superblocks and you haven't
tried to do any --create operations.

[trim /]

> !reassemble force
> mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 --force
> mdadm: ignoring /dev/sdd1 as it reports /dev/sdc1 as failed
> mdadm: ignoring /dev/sde1 as it reports /dev/sdc1 as failed
> mdadm: ignoring /dev/sdf1 as it reports /dev/sdc1 as failed
> mdadm: /dev/md0 assembled from 1 drive - not enough to start the array.

This should have worked.  Hmmm.

> i hope i don’t get the award „paint onself in to the corner“ ……

Probably not. :-)

The simplest way forward would probably be to boot a rescue CD (I
generally use the one from sysrescuecd.org) that has a recent kernel and
mdadm combination.  Such CDs will probably attempt to assemble your
array during boot to /dev/md127 instead of /dev/md0, but it will fail.

So, within the rescue environment, do:

mdadm --stop /dev/md127  {or whatever shows in /proc/mdstat}

mdadm --assemble --force --verbose /dev/md0 /dev/sd[b-f]1

If that doesn't work, show us the verbose output, along with the
matching part of the dmesg.

If it does work, just do a clean shutdown and reboot back into your
regular OS.

> merry christmas … David

And Merry Christmas to you!

When you are done celebrating the revival of your array, you will need
to find out why it broke in the first place.  The most common cause seen
on this list is the use of consumer-grade drives without dealing with
the timeout mismatch problem.  You might want to review this old thread:

http://marc.info/?l=linux-raid&m=135811522817345&w=1

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html