On Jan 19, 2008, at 3:44 AM, Ask Bjørn Hansen wrote:
Replying to myself with an update, mostly for the sake of the archives
(I went through the linux-raid mail from the last year yesterday while
waiting for my raw-partition backups to finish).
I mentioned[1] my trouble with the multipath detection code on the
Fedora rescue mode messing up my raid yesterday.
[...]
I suspect that maybe the layout of the md device got messed up? How
can I find out if that's the case? Would it be possible to recover
from (assuming all the data still is on some of the disks).
I realized that of the 11 disks (9 in the raid, 2 spares) one of the
disks affected by the "fake multipath" mishap was a spare, so after
backing up all the raw partitions[2] I re-created the raid in place
with the other affected disk marked missing and it seems like the file
system is more or less okay. Yes, I'm doing a backup now. :-)
Lessons:
1) Do backups of your raid'ed data. Yes, it can be a pain but
figure it out.
2) Keep your root partition on a simple raid1 (or on a lvm group
that's on a simple raid1).
3) When the raid goes @#$ - don't panic, make sure nothing is being
written to the disks and stop. (Some years ago I lost a raid5 to the
"oops, had a read-error, drop the last disk" issue and I suspect I
could have saved it had I been patient and stopped working on it until
I was more awake).
4) Have/make copies of the mdadm -D / -E output.
5) If you care about the data, do a backup of your raw partitions
before trying to restore.
6) The "create the raid on top of the old raid" trick saves the day
again (for a while I had some kind of cabling problem on a box with a
raid6 - I lost track of how many times I did the recreate thing).
Secondary question: I'm doing a "dd if=/dev/sdX5 bs=256k > /backup/
sdX5" for each disk -- is there a way to run mdadm on the copies and
experiment on those? (It took ~forever to copy a terabyte of the
raw partitions).
(For the archives) - I didn't try it, but setting up the disk images
as loop devices should work. I didn't think of that yesterday.
- ask
[1] http://marc.info/?l=linux-raid&m=120065542429935&w=2
[2] And oh man am I glad I backed them up. On my first attempt at
recreating the raid I forgot the md device parameter and --assume-
clean, so it created a raid device on one of my source partitions and
immediately started syncing at ~120MB/sec. Restoring the partitions
from the backup worked fine fortunately.
--
http://develooper.com/ - http://askask.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html