Re: Can this setup be saved?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 12, 2010 at 5:56 PM, Dmitry Teytelman <dim@xxxxxxxxxx> wrote:
> Hello,
>
> I've made a mess of my raid setup and am desperately trying to save
> it. The setup is RAID-5 on 3 SATA disks. Problems started with one of
> the disks getting unrecoverable read errors. Unfortunately I was away
> on a trip and the machine was used by my family while this was going
> on :(
>
> Array consists of three devices: /dev/sda2, /dev/sdc2, and /dev/sdd2.
> When I got back from the trip I found the following:
>
> 1. Two disks were removed from the array, leaving only /dev/sda2;
> 2. When either of the two was added, the array would start;
> 3. One combination of two disks (/dev/sda2 + /dev/sdd2) aproduced a
> running /dev/md0 with a proper ext3 filesystem seen on the drive (even
> passing fsck);
>
> At this point I added /dev/sdc2 and the reconstruction started.
> However did not complete, since /dev/sdd2 has unrecoverable errors.
> Reading the list archives I figured I need another drive to ddrescue
> /dev/sdd2, then perform the reconstruction.
>
> However at some point during/after the reconstruction the situation
> has changed. Now both /dev/sdc2 and /dev/sdd2 are marked as spare
> drives (see mdadm -E output below) and I cannot start the array. I
> think /dev/sdd2 should be in sync with /dev/sda2, but how can I bring
> it back (it used to be device 2)?
>
> /dev/sda2:
>          Magic : a92b4efc
>        Version : 0.90.01
>           UUID : bd5c2dc0:f76e5f10:a98c4de7:f2020715
>  Creation Time : Fri Jun 17 11:47:44 2005
>     Raid Level : raid5
>  Used Dev Size : 486375808 (463.84 GiB 498.05 GB)
>     Array Size : 972751616 (927.69 GiB 996.10 GB)
>   Raid Devices : 3
>  Total Devices : 1
> Preferred Minor : 0
>
>    Update Time : Fri Feb 12 14:07:35 2010
>          State : active
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 2
>  Spare Devices : 0
>       Checksum : a4cd0c48 - correct
>         Events : 2125155
>
>         Layout : left-symmetric
>     Chunk Size : 128K
>
>      Number   Major   Minor   RaidDevice State
> this     0       8        2        0      active sync   /dev/sda2
>
>   0     0       8        2        0      active sync   /dev/sda2
>   1     1       0        0        1      faulty removed
>   2     2       0        0        2      faulty removed
>
> /dev/sdc2:
>          Magic : a92b4efc
>        Version : 0.90.01
>           UUID : bd5c2dc0:f76e5f10:a98c4de7:f2020715
>  Creation Time : Fri Jun 17 11:47:44 2005
>     Raid Level : raid5
>  Used Dev Size : 486375808 (463.84 GiB 498.05 GB)
>     Array Size : 972751616 (927.69 GiB 996.10 GB)
>   Raid Devices : 3
>  Total Devices : 1
> Preferred Minor : 0
>
>    Update Time : Fri Feb 12 10:30:00 2010
>          State : active
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 2
>  Spare Devices : 0
>       Checksum : a4ccd973 - correct
>         Events : 2125153
>
>         Layout : left-symmetric
>     Chunk Size : 128K
>
>      Number   Major   Minor   RaidDevice State
> this     4       8        2       -1      spare   /dev/sdc2
>
>   0     0       8       34        0      active sync   /dev/sda2
>   1     1       0        0        1      faulty removed
>   2     2       0        0        2      faulty removed
>
> /dev/sdd2:
>          Magic : a92b4efc
>        Version : 0.90.01
>           UUID : bd5c2dc0:f76e5f10:a98c4de7:f2020715
>  Creation Time : Fri Jun 17 11:47:44 2005
>     Raid Level : raid5
>  Used Dev Size : 486375808 (463.84 GiB 498.05 GB)
>     Array Size : 972751616 (927.69 GiB 996.10 GB)
>   Raid Devices : 3
>  Total Devices : 2
> Preferred Minor : 0
>
>    Update Time : Fri Feb 12 10:36:05 2010
>          State : active
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 2
>  Spare Devices : 1
>       Checksum : a4ccdb48 - correct
>         Events : 2125154
>
>         Layout : left-symmetric
>     Chunk Size : 128K
>
>      Number   Major   Minor   RaidDevice State
> this     3       8       50        3      spare   /dev/sdd2
>
>   0     0       8       34        0      active sync   /dev/sda2
>   1     1       0        0        1      faulty removed
>   2     2       0        0        2      faulty removed
>   3     3       8       50        3      spare   /dev/sdd2
>
>
> --
> Dmitry Teytelman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

It sounds like you've reached the point where you've done something
silly and need to try to recover what you can.

READ THIS SITE CAREFULLY; perform READ ONLY recovery, and see if you
can find a permutation that can be mounted read only and READ with
valid data;

DO NOT REUSE your current disks; they have been kicked for a FAILURE
TO WRITE (if you're using a recent kernel), and should be considered
walking dead.  If you are able to read your data, copy it off to a
fresh array.

IF your new disks are large enough to hold a complete copy of your old
drives and your new copy, then dd_rescue the old drives that are a
valid combination to the end of the disks, make your array in the
beginning, then when you're done copying it take the new array offline
too, extend the partitions, grow the array in to the partitions (and
any underlying structures like LVM)

http://raid.wiki.kernel.org/index.php/RAID_Recovery

http://raid.wiki.kernel.org/index.php/Permute_array.pl

#!/usr/bin/perl -w

# If you forgot how you built an array and need to try various
# permutations then this is for you...

# based on Mark-Jason Dominus' mjd_permute: permute each word of input

use strict;
use Getopt::Long;

sub usage {
 return "syntax: permute_array --md <md_device> --mount <mountpoint>
[--opts <mdadm options>] [--for_real] <all devices>\n";
}

my $MD_DEVICE;
my $MOUNTPOINT;
my $MDADM_OPTS="";
my $REAL;

################################################################
# This function is passed each permutation of component devices.
# This includes a 'missing' device.
# This is the place to hack command variations etc...
sub try_array {
  # @_ looks like: ("/dev/sda1", "missing", "/dev/sdb1")
  my @device_list = @_;
  my $num_devices = scalar @_;

  # This may need a --force... <gulp>
  my $create = "yes | mdadm --create $MD_DEVICE
--raid-devices=$num_devices --level=5 $MDADM_OPTS @device_list
2>/dev/null\n";
  # Don't forget to mount read-only
  my $mount =  "mount -oro $MD_DEVICE $MOUNTPOINT 2>/dev/null";
  my $umount =  "umount $MOUNTPOINT 2>/dev/null";
  # and stop the array...
  my $stop = "mdadm --stop $MD_DEVICE 2>/dev/null";

  # REAL == --for_real option
  if ($REAL) {
    # we expect this to succeed
    system $create;
    if (my $err = $?>>8) {
      die "command : $create\n   exited with status $err\n\n";
    }
    # we expect this to fail and are happy if it succeeds
    system $mount;
    if (!(my $err = $?>>8)) {
      print "Success. possible command : \n  $create\n";
      system $umount;
    }
    # we expect this to succeed
    system $stop;
    if (my $err = $?>>8) {
      die "command : $stop\n   exited with status $err\n\n";
    }
  } else {
    # Just show the create/mount/stop commands
    # If you want more control you could use this to write a script
    print "$create\n$mount\n$stop\n";
  }
}


################################################################
# Execution starts here...
#
sub factorial($);

GetOptions ('md=s'      => \$MD_DEVICE,
	    "mount=s"   => \$MOUNTPOINT,
	    "opts=s"    => \$MDADM_OPTS,
	    "for_real"  => \$REAL);

if (!defined($MD_DEVICE) or !defined($MOUNTPOINT)) {
  die &usage;
}

print "using device $MD_DEVICE and mounting on $MOUNTPOINT\n";

# we *always* assume a 'missing' device - not doing so will destroy
# the array...
my @devices = @ARGV;
# how many devices?
my $num_devices = scalar @devices;
if ($num_devices < 2) {
  die "$0 needs at least two component devices\n";
}
# how many base permutations...
my $num_permutations = factorial(scalar @devices);
# try all permutations, substituting 'missing' for each device in
# turn...
for (my $d=0; $d < $num_devices; $d++) {
  my $skip_device = $devices[$d];
  $devices[$d] = "missing";
  print "skipping $skip_device\n\n";
  for (my $i=0; $i < $num_permutations; $i++) {
    my @permutation = @devices[n2perm($i, $#devices)];
    try_array(@permutation);
  }
  $devices[$d] = $skip_device;
}

################################################################
# permutation code

# n2pat($N, $len) : produce the $N-th pattern of length $len
sub n2pat {
    my $i   = 1;
    my $N   = shift;
    my $len = shift;
    my @pat;
    while ($i <= $len + 1) {   # Should really be just while ($N) { ...
        push @pat, $N % $i;
        $N = int($N/$i);
        $i++;
    }
    return @pat;
}

# pat2perm(@pat) : turn pattern returned by n2pat() into
# permutation of integers.  XXX: splice is already O(N)
sub pat2perm {
    my @pat    = @_;
    my @source = (0 .. $#pat);
    my @perm;
    push @perm, splice(@source, (pop @pat), 1) while @pat;
    return @perm;
}

# n2perm($N, $len) : generate the Nth permutation of $len objects
sub n2perm {
    pat2perm(n2pat(@_));
}

# Utility function: factorial with memoizing
BEGIN {
  my @fact = (1);
  sub factorial($) {
    my $n = shift;
    return $fact[$n] if defined $fact[$n];
    $fact[$n] = $n * factorial($n - 1);
  }
}
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux