Updating this e-mail thread. I got the latest mdadm version that
supports data offset variance per device and attempted to reconstruct
RAID6 according to previous data, but so far no luck.
As far as I a can tell (sadly), all of my data is lost. I've updated the
forum thread with the final details and failures.
http://www.linuxquestions.org/questions/linux-server-73/mdadm-raid6-active-with-spares-and-failed-disks%3B-need-help-4175530127/
I'll leave the drives "in this state" until end of the month in hopes
that someone has another idea on how to recover.
NOTE: I will pay $$$ if there is a person that helps me to recover the
data :)
~Matt
-------- Original Message --------
From: Matt Callaghan <matt_callaghan@xxxxxxxxxxxx>
Sent: Wed 07 Jan 2015 08:34:01 AM EST
To: linux-raid@xxxxxxxxxxxxxxx
Cc:
Subject: Re: mdadm RAID6 "active" with spares and failed disks; need help
Just to give a small update (I realize many people may still be on
holidays)
I've tried to work with a few people on IRC, and in conjunction with
lots of reading from others' experiences attempting to recover the array
but no luck yet.
I /hope/ I haven't ruined anything.
The forum post referenced below has full details, but here's a summary
of "what happened"
notice how some drives are "moving" around :( [either due to a mistake I
made, or the server haulting/lockup during rebuilds, I'm not sure]
{{{
-------------------------------------------------------------------------------------------
| | Device Role #
-------------------------------------------------------------------------------------------
| DEVICE | COMMENTS | Dec GOOD | Jan4 6:28AM | 12:10PM | 12:40PM |
Jan5 12:30AM | 12:50AM | 8:30AM | 6:34PM | Jan6 6:45AM |
-------------------------------------------------------------------------------------------
| /dev/sdi | | 4 | 4 | 4 | 4 |
4 | 4 | 4 | 4 | 4 |
| /dev/sdj | failing | 5 | 5 FAIL | ( ) | 8 |
8 | 8 FAIL | ( ) | ( ) | ( ) |
| /dev/sdk | failing? | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 FAIL | 0 FAIL |
| /dev/sdl | | 6 | 6 | 6 | 6 |
6 | 6 | 6 | 6 | 6 |
| /dev/sdm | | 1 | 1 | 1 | 1 | (
) | ( ) | ( ) | 8 | 8 SPARE |
| /dev/sdn | | 2 | 2 | 2 | 2 |
2 | 2 | 2 | 2 | 2 |
| /dev/sdo | | 3 | 3 | 3 | 3 |
3 | 3 | 3 | 3 | 3 |
| /dev/sdp | | 7 | 7 | 7 | 7 |
7 | 7 | 7 | 7 | 7 |
-------------------------------------------------------------------------------------------
}}}
Full details from my e-mail notifications of /proc/mdstat (although
unfortunately I don't have FULL mdadm --detail/examine information per
state transition)
{{{
Dec GOOD
md2000 : active raid6 sdo1[3] sdj1[5] sdk1[0] sdi1[4] sdn1[2] sdm1[1]
sdp1[7] sdl1[6]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/8] [UUUUUUUU]
FAIL EVENT on Jan 4th @ 6:28AM
md2000 : active raid6 sdo1[3] sdj1[5](F) sdk1[0] sdi1[4] sdn1[2] sdm1[1]
sdp1[7] sdl1[6]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/7] [UUUU_UUU]
[==============>......] check = 73.6% (1439539228/1953513408)
finish=536.6min speed=15960K/sec
DEGRADED EVENT on Jan 4th @ 6:39AM
md2000 : active raid6 sdo1[3] sdj1[5](F) sdk1[0] sdi1[4] sdn1[2] sdm1[1]
sdp1[7] sdl1[6]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/7] [UUUU_UUU]
[==============>......] check = 73.6% (1439539228/1953513408)
finish=5091.8min speed=1682K/sec
DEGRADED EVENT on Jan 4th @ 12:10PM
md2000 : active raid6 sdo1[3] sdn1[2] sdi1[4] sdm1[1] sdk1[0] sdp1[7]
sdl1[6]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/7] [UUUU_UUU]
DEGRADED EVENT on Jan 4th @ 12:21PM
md2000 : active raid6 sdk1[0] sdo1[3] sdm1[1] sdn1[2] sdi1[4] sdp1[7]
sdl1[6]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/7] [UUUU_UUU]
DEGRADED EVENT on Jan 4th @ 12:40PM
md2000 : active raid6 sdj1[8] sdm1[1] sdo1[3] sdn1[2] sdk1[0] sdi1[4]
sdp1[7] sdl1[6]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/7] [UUUU_UUU]
[>....................] recovery = 0.2% (5137892/1953513408)
finish=921.7min speed=35227K/sec
DEGRADED EVENT on Jan 5th @ 12:30AM
md2000 : active raid6 sdk1[0] sdo1[3] sdn1[2] sdj1[8] sdi1[4] sdl1[6]
sdp1[7]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/6] [U_UU_UUU]
[============>........] recovery = 62.9% (1229102028/1953513408)
finish=259.8min speed=46466K/sec
FAIL SPARE EVENT on Jan 5th @ 12:50AM
md2000 : active raid6 sdk1[0] sdo1[3] sdn1[2] sdj1[8](F) sdi1[4] sdl1[6]
sdp1[7]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/6] [U_UU_UUU]
[=============>.......] recovery = 68.1% (1332029020/1953513408)
finish=150.3min speed=68897K/sec
DEGRADED EVENT on Jan 5th @ 6:43AM
md2000 : active raid6 sdk1[0] sdo1[3] sdn1[2] sdj1[8](F) sdi1[4] sdl1[6]
sdp1[7]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/6] [U_UU_UUU]
[=============>.......] recovery = 68.1% (1332029020/1953513408)
finish=76028.6min speed=136K/sec
TEST MESSAGE on Jan 5th @ 8:30AM
md2000 : active raid6 sdo1[3] sdi1[4] sdn1[2] sdk1[0] sdl1[6] sdp1[7]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/6] [U_UU_UUU]
}}}
I've tried mdadm --create --assume-clean for several combinations of the
"device role # ordering", but so far none have exposed a usable ext4
partition for /dev/md2000.
Was speaking with someone on IRC, and it's been shown that the data
offset for the devices has changed over time in mdadm, so I need to
recompile mdadm 3.3.x and attempt it that way.
I'll update when I get to trying that.
~Fermmy
-------- Original Message --------
From: Matt Callaghan <matt_callaghan@xxxxxxxxxxxx>
Sent: Tue 06 Jan 2015 09:16:52 AM EST
To: linux-raid@xxxxxxxxxxxxxxx
Cc:
Subject: mdadm RAID6 "active" with spares and failed disks; need help
I think I'm in a really bad state. Could an expert w/ mdadm please
help?
I have a RAID6 mdadm device, and it got really messed up with spares:
{{{
md2000 : active raid6 sdm1[8](S) sdo1[3] sdi1[4] sdn1[2] sdk1[0](F)
sdl1[6] sdp1[7]
11721080448 blocks super 1.1 level 6, 64k chunk, algorithm 2
[8/5] [__UU_UUU]
}}}
And is now really broken (inactive)
{{{
md2000 : inactive sdn1[2](S) sdm1[8](S) sdl1[6](S) sdp1[7](S) sdi1[4](S)
sdo1[3](S) sdk1[0](S)
13674593976 blocks super 1.1
}}}
I have a forum post going w/ full details
http://www.linuxquestions.org/questions/linux-server-73/mdadm-raid6-active-with-spares-and-failed-disks%3B-need-help-4175530127/
I /think/ I need to force re-assembly here, but I'd like some review
from the experts before proceeding.
Thank you in advance for your time,
~Matt/Fermulator
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html