On Fri, Feb 12, 2010 at 5:56 PM, Dmitry Teytelman <dim@xxxxxxxxxx> wrote: > Hello, > > I've made a mess of my raid setup and am desperately trying to save > it. The setup is RAID-5 on 3 SATA disks. Problems started with one of > the disks getting unrecoverable read errors. Unfortunately I was away > on a trip and the machine was used by my family while this was going > on :( > > Array consists of three devices: /dev/sda2, /dev/sdc2, and /dev/sdd2. > When I got back from the trip I found the following: > > 1. Two disks were removed from the array, leaving only /dev/sda2; > 2. When either of the two was added, the array would start; > 3. One combination of two disks (/dev/sda2 + /dev/sdd2) aproduced a > running /dev/md0 with a proper ext3 filesystem seen on the drive (even > passing fsck); > > At this point I added /dev/sdc2 and the reconstruction started. > However did not complete, since /dev/sdd2 has unrecoverable errors. > Reading the list archives I figured I need another drive to ddrescue > /dev/sdd2, then perform the reconstruction. > > However at some point during/after the reconstruction the situation > has changed. Now both /dev/sdc2 and /dev/sdd2 are marked as spare > drives (see mdadm -E output below) and I cannot start the array. I > think /dev/sdd2 should be in sync with /dev/sda2, but how can I bring > it back (it used to be device 2)? > > /dev/sda2: > Magic : a92b4efc > Version : 0.90.01 > UUID : bd5c2dc0:f76e5f10:a98c4de7:f2020715 > Creation Time : Fri Jun 17 11:47:44 2005 > Raid Level : raid5 > Used Dev Size : 486375808 (463.84 GiB 498.05 GB) > Array Size : 972751616 (927.69 GiB 996.10 GB) > Raid Devices : 3 > Total Devices : 1 > Preferred Minor : 0 > > Update Time : Fri Feb 12 14:07:35 2010 > State : active > Active Devices : 1 > Working Devices : 1 > Failed Devices : 2 > Spare Devices : 0 > Checksum : a4cd0c48 - correct > Events : 2125155 > > Layout : left-symmetric > Chunk Size : 128K > > Number Major Minor RaidDevice State > this 0 8 2 0 active sync /dev/sda2 > > 0 0 8 2 0 active sync /dev/sda2 > 1 1 0 0 1 faulty removed > 2 2 0 0 2 faulty removed > > /dev/sdc2: > Magic : a92b4efc > Version : 0.90.01 > UUID : bd5c2dc0:f76e5f10:a98c4de7:f2020715 > Creation Time : Fri Jun 17 11:47:44 2005 > Raid Level : raid5 > Used Dev Size : 486375808 (463.84 GiB 498.05 GB) > Array Size : 972751616 (927.69 GiB 996.10 GB) > Raid Devices : 3 > Total Devices : 1 > Preferred Minor : 0 > > Update Time : Fri Feb 12 10:30:00 2010 > State : active > Active Devices : 1 > Working Devices : 1 > Failed Devices : 2 > Spare Devices : 0 > Checksum : a4ccd973 - correct > Events : 2125153 > > Layout : left-symmetric > Chunk Size : 128K > > Number Major Minor RaidDevice State > this 4 8 2 -1 spare /dev/sdc2 > > 0 0 8 34 0 active sync /dev/sda2 > 1 1 0 0 1 faulty removed > 2 2 0 0 2 faulty removed > > /dev/sdd2: > Magic : a92b4efc > Version : 0.90.01 > UUID : bd5c2dc0:f76e5f10:a98c4de7:f2020715 > Creation Time : Fri Jun 17 11:47:44 2005 > Raid Level : raid5 > Used Dev Size : 486375808 (463.84 GiB 498.05 GB) > Array Size : 972751616 (927.69 GiB 996.10 GB) > Raid Devices : 3 > Total Devices : 2 > Preferred Minor : 0 > > Update Time : Fri Feb 12 10:36:05 2010 > State : active > Active Devices : 1 > Working Devices : 2 > Failed Devices : 2 > Spare Devices : 1 > Checksum : a4ccdb48 - correct > Events : 2125154 > > Layout : left-symmetric > Chunk Size : 128K > > Number Major Minor RaidDevice State > this 3 8 50 3 spare /dev/sdd2 > > 0 0 8 34 0 active sync /dev/sda2 > 1 1 0 0 1 faulty removed > 2 2 0 0 2 faulty removed > 3 3 8 50 3 spare /dev/sdd2 > > > -- > Dmitry Teytelman > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > It sounds like you've reached the point where you've done something silly and need to try to recover what you can. READ THIS SITE CAREFULLY; perform READ ONLY recovery, and see if you can find a permutation that can be mounted read only and READ with valid data; DO NOT REUSE your current disks; they have been kicked for a FAILURE TO WRITE (if you're using a recent kernel), and should be considered walking dead. If you are able to read your data, copy it off to a fresh array. IF your new disks are large enough to hold a complete copy of your old drives and your new copy, then dd_rescue the old drives that are a valid combination to the end of the disks, make your array in the beginning, then when you're done copying it take the new array offline too, extend the partitions, grow the array in to the partitions (and any underlying structures like LVM) http://raid.wiki.kernel.org/index.php/RAID_Recovery http://raid.wiki.kernel.org/index.php/Permute_array.pl #!/usr/bin/perl -w # If you forgot how you built an array and need to try various # permutations then this is for you... # based on Mark-Jason Dominus' mjd_permute: permute each word of input use strict; use Getopt::Long; sub usage { return "syntax: permute_array --md <md_device> --mount <mountpoint> [--opts <mdadm options>] [--for_real] <all devices>\n"; } my $MD_DEVICE; my $MOUNTPOINT; my $MDADM_OPTS=""; my $REAL; ################################################################ # This function is passed each permutation of component devices. # This includes a 'missing' device. # This is the place to hack command variations etc... sub try_array { # @_ looks like: ("/dev/sda1", "missing", "/dev/sdb1") my @device_list = @_; my $num_devices = scalar @_; # This may need a --force... <gulp> my $create = "yes | mdadm --create $MD_DEVICE --raid-devices=$num_devices --level=5 $MDADM_OPTS @device_list 2>/dev/null\n"; # Don't forget to mount read-only my $mount = "mount -oro $MD_DEVICE $MOUNTPOINT 2>/dev/null"; my $umount = "umount $MOUNTPOINT 2>/dev/null"; # and stop the array... my $stop = "mdadm --stop $MD_DEVICE 2>/dev/null"; # REAL == --for_real option if ($REAL) { # we expect this to succeed system $create; if (my $err = $?>>8) { die "command : $create\n exited with status $err\n\n"; } # we expect this to fail and are happy if it succeeds system $mount; if (!(my $err = $?>>8)) { print "Success. possible command : \n $create\n"; system $umount; } # we expect this to succeed system $stop; if (my $err = $?>>8) { die "command : $stop\n exited with status $err\n\n"; } } else { # Just show the create/mount/stop commands # If you want more control you could use this to write a script print "$create\n$mount\n$stop\n"; } } ################################################################ # Execution starts here... # sub factorial($); GetOptions ('md=s' => \$MD_DEVICE, "mount=s" => \$MOUNTPOINT, "opts=s" => \$MDADM_OPTS, "for_real" => \$REAL); if (!defined($MD_DEVICE) or !defined($MOUNTPOINT)) { die &usage; } print "using device $MD_DEVICE and mounting on $MOUNTPOINT\n"; # we *always* assume a 'missing' device - not doing so will destroy # the array... my @devices = @ARGV; # how many devices? my $num_devices = scalar @devices; if ($num_devices < 2) { die "$0 needs at least two component devices\n"; } # how many base permutations... my $num_permutations = factorial(scalar @devices); # try all permutations, substituting 'missing' for each device in # turn... for (my $d=0; $d < $num_devices; $d++) { my $skip_device = $devices[$d]; $devices[$d] = "missing"; print "skipping $skip_device\n\n"; for (my $i=0; $i < $num_permutations; $i++) { my @permutation = @devices[n2perm($i, $#devices)]; try_array(@permutation); } $devices[$d] = $skip_device; } ################################################################ # permutation code # n2pat($N, $len) : produce the $N-th pattern of length $len sub n2pat { my $i = 1; my $N = shift; my $len = shift; my @pat; while ($i <= $len + 1) { # Should really be just while ($N) { ... push @pat, $N % $i; $N = int($N/$i); $i++; } return @pat; } # pat2perm(@pat) : turn pattern returned by n2pat() into # permutation of integers. XXX: splice is already O(N) sub pat2perm { my @pat = @_; my @source = (0 .. $#pat); my @perm; push @perm, splice(@source, (pop @pat), 1) while @pat; return @perm; } # n2perm($N, $len) : generate the Nth permutation of $len objects sub n2perm { pat2perm(n2pat(@_)); } # Utility function: factorial with memoizing BEGIN { my @fact = (1); sub factorial($) { my $n = shift; return $fact[$n] if defined $fact[$n]; $fact[$n] = $n * factorial($n - 1); } } -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html