Hi all
I've come across some information with a similar situation with similar,
albeit without raid:
http://sourceforge.net/projects/clonezilla/forums/forum/663168/topic/4833772
Importantly, the errors given at the above URL are very similar to
errors I noticed whenever the server crashed:
[76635.205262] ata1.00 exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
....(similar lines of text including one: failed command : READ FPDMA
QUEUED [76635.210673] ata1.00 status {DRDY ERR} [76635.210698] ata1.00
error: {UNC}
I'm attempting to use ddrescue as described here to clone the
fail{ed,ing} disk another spare disk:
http://www.forensicswiki.org/wiki/Ddrescue
Hopefully this part works out, but I'm still not 100% sure if I'm doing
"the right thing" to get this sorted or if there's an alternative method.
On 2012/07/04 10:18 PM, Brendan Hide wrote:
Hi, all
In case its relevant I'm using ArchLinux' LTS kernel 3.0.36-1-lts and
mdadm v3.2.5 (2012, May 18th). At first I asked for help on the
ArchLinux forums but have had zero response:
https://bbs.archlinux.org/viewtopic.php?id=144448
I have(had?) a raid5 array of 4x 1.5TB drives (that works out to 4.5
TB or 4.1TiB). I added another drive, went through the standard growth
procedure and everything seemed fine. At about 66% through the
reshape, one of the disks failed and, due to the resulting blocking
errors (some details below), it eventually caused a
crash/panic/reboot/something. I was away at the time however I did at
least get a failspare notification mail with the following md3 detail
before the crash:
md3 : active raid5 sdb1[6](S) sdf1[4] sde1[3] sdc1[5](F) sdd1[1]
4395408384 blocks super 1.2 level 5, 512k chunk, algorithm 2
[5/3] [_UUU_]
[=============>.......] reshape = 66.8% (980003500/1465136128)
finish=759.8min speed=10640K/sec
In theory all my data should still be available on the remaining
disks, I just don't know how to get to it. Here's what I've been
trying so far:
*
Attempting to assemble the array with 4 out of 5 drives is
unsuccessful because the new drive appears to be seen as a "spare" -
perhaps that is standard until such time that it is fully integrated
into the array. The output here is:
|mdadm: /dev/md3 assembled from 3 drives and 1 spare - not enough
to start the array.|
*
Attempting to assemble the array with 5 out of 5 drives works
briefly but, no matter what I do, mdadm tries to finish reshaping.
Two minutes after the assemble attempt, because the disk is giving
an apparently permanent read error, the console starts printing
messages along the lines of:
|INFO: task md3_reshape:$PID blocked for more than 120 seconds.
[ 1080.320000] "echo 0> /proc/sys/kernel/hung_task_timeout_secs"
disables this message|
This is in spite that even /proc/mdstat shows that the disk is
failed and that the array is degraded. After a few minutes of the
above error I have to REISUB (or even hard-reset) due to the server
becoming grandually unresponsive. I really don't want to do that too
often. I've tried using the "--freeze-reshape" flag but I'm either
doing it wrong or I'm misunderstanding the purpose of that option.
This is the status immediately after booting with the failed disk
unplugged. A reassemble requires a --stop, (optionally my plugging in
the failed drive, --stop again), and then the --assemble command:
|$ cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md3 : inactive sdc1[1](S) sdb1[6](S) sde1[4](S) sdd1[3](S)
5860546144 blocks super 1.2
md1 : active raid1 sdf3[2] sda3[0]
239746096 blocks super 1.2 [2/2] [UU]
bitmap: 1/2 pages [4KB], 65536KB chunk
md4 : active raid1 sdf2[1] sda2[0]
4193268 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
md0 : active raid1 sdf1[1] sda1[0]
255936 blocks [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
unused devices:<none>|
The server is a personal file server. It contains a lot of unimportant
data but it does contain some important documents and photos I'd like
to retrieve. Any help would be appreciated.
--
Brendan Hide
083 448 3867
http://swiftspirit.co.za/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html