raid10 messed up by false multipath setup (was: raid10 messed up filesystem, lvm lv ok)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Jan 19, 2008, at 3:44 AM, Ask Bjørn Hansen wrote:

Replying to myself with an update, mostly for the sake of the archives (I went through the linux-raid mail from the last year yesterday while waiting for my raw-partition backups to finish).

I mentioned[1] my trouble with the multipath detection code on the Fedora rescue mode messing up my raid yesterday.
[...]
I suspect that maybe the layout of the md device got messed up? How can I find out if that's the case? Would it be possible to recover from (assuming all the data still is on some of the disks).

I realized that of the 11 disks (9 in the raid, 2 spares) one of the disks affected by the "fake multipath" mishap was a spare, so after backing up all the raw partitions[2] I re-created the raid in place with the other affected disk marked missing and it seems like the file system is more or less okay. Yes, I'm doing a backup now. :-)

Lessons:

1) Do backups of your raid'ed data. Yes, it can be a pain but figure it out.

2) Keep your root partition on a simple raid1 (or on a lvm group that's on a simple raid1).

3) When the raid goes @#$ - don't panic, make sure nothing is being written to the disks and stop. (Some years ago I lost a raid5 to the "oops, had a read-error, drop the last disk" issue and I suspect I could have saved it had I been patient and stopped working on it until I was more awake).

  4) Have/make copies of the mdadm -D / -E output.

5) If you care about the data, do a backup of your raw partitions before trying to restore.

6) The "create the raid on top of the old raid" trick saves the day again (for a while I had some kind of cabling problem on a box with a raid6 - I lost track of how many times I did the recreate thing).

Secondary question: I'm doing a "dd if=/dev/sdX5 bs=256k > /backup/ sdX5" for each disk -- is there a way to run mdadm on the copies and experiment on those? (It took ~forever to copy a terabyte of the raw partitions).

(For the archives) - I didn't try it, but setting up the disk images as loop devices should work. I didn't think of that yesterday.


 - ask


[1] http://marc.info/?l=linux-raid&m=120065542429935&w=2


[2] And oh man am I glad I backed them up. On my first attempt at recreating the raid I forgot the md device parameter and --assume- clean, so it created a raid device on one of my source partitions and immediately started syncing at ~120MB/sec. Restoring the partitions from the backup worked fine fortunately.

--
http://develooper.com/ - http://askask.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux