raid10 messed up by false multipath setup (was: raid10 messed up filesystem, lvm lv ok)

Ask Bjørn Hansen <ask@xxxxxxxxxxxxxx> · Mon, 21 Jan 2008 02:32:08 -0800

On Jan 19, 2008, at 3:44 AM, Ask Bjørn Hansen wrote:

Replying to myself with an update, mostly for the sake of the archives  
(I went through the linux-raid mail from the last year yesterday while  
waiting for my raw-partition backups to finish).

I mentioned[1] my trouble with the multipath detection code on the  
Fedora rescue mode messing up my raid yesterday.
[...]
I suspect that maybe the layout of the md device got messed up?  How  
can I find out if that's the case?  Would it be possible to recover  
from (assuming all the data still is on some of the disks).

I realized that of the 11 disks (9 in the raid, 2 spares) one of the  
disks affected by the "fake multipath" mishap was a spare, so after  
backing up all the raw partitions[2] I re-created the raid in place  
with the other affected disk marked missing and it seems like the file  
system is more or less okay.   Yes, I'm doing a backup now.  :-)

Lessons:

  1) Do backups of your raid'ed data.   Yes, it can be a pain but  
figure it out.

  2) Keep your root partition on a simple raid1 (or on a lvm group  
that's on a simple raid1).

  3) When the raid goes @#$ - don't panic, make sure nothing is being  
written to the disks and stop.   (Some years ago I lost a raid5 to the  
"oops, had a read-error, drop the last disk" issue and I suspect I  
could have saved it had I been patient and stopped working on it until  
I was more awake).

  4) Have/make copies of the mdadm -D / -E output.

  5) If you care about the data, do a backup of your raw partitions  
before trying to restore.

  6) The "create the raid on top of the old raid" trick saves the day  
again (for a while I had some kind of cabling problem on a box with a  
raid6 - I lost track of how many times I did the recreate thing).

Secondary question: I'm doing a "dd if=/dev/sdX5 bs=256k > /backup/ 
sdX5" for each disk -- is there a way to run mdadm on the copies and  
experiment on those?    (It took ~forever to copy a terabyte of the  
raw partitions).

(For the archives) - I didn't try it, but setting up the disk images  
as loop devices should work.  I didn't think of that yesterday.

 - ask

[1] http://marc.info/?l=linux-raid&m=120065542429935&w=2

[2] And oh man am I glad I backed them up.  On my first attempt at  
recreating the raid I forgot the md device parameter and --assume- 
clean, so it created a raid device on one of my source partitions and  
immediately started syncing at ~120MB/sec.  Restoring the partitions  
from the backup worked fine fortunately.

--
http://develooper.com/ - http://askask.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html