RE: RAID 5 array recovery - two drives errors in external enclosure

"Guy Watkins" <linux-raid@xxxxxxxxxxxxxxxx> · Thu, 17 Sep 2009 21:31:04 -0400

It is the way you list the drives.  Look at this command:
# echo /dev/sd[bdce]1
/dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Notice the output is not in the same order as in the command.  You should
list each disk in the order you want.  Like this:
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
missing

I hope this helps.

} -----Original Message-----
} From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
} owner@xxxxxxxxxxxxxxx] On Behalf Of Tim Bostrom
} Sent: Thursday, September 17, 2009 7:55 PM
} To: linux-raid
} Subject: Re: RAID 5 array recovery - two drives errors in external
} enclosure
} 
} It's still showing the order that you had previously posted:  [bcde]
} (see log below)
} 
} It appears that trying different permutations isn't yielding any
} change.  I haven't tried every permutation, but are these commands
} supposed to yield different effects?  They seem to always build the
} array as [bcde] no matter what.  Or should I be swapping around the
} cables on the drives?
} 
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
} 
} 
} -Tim
} 
} [root@tera ~]# mdadm --examine /dev/sdb1
} /dev/sdb1:
}           Magic : a92b4efc
}         Version : 0.90.00
}            UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
}   Creation Time : Thu Sep 17 16:13:45 2009
}      Raid Level : raid5
}   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
}      Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
}    Raid Devices : 5
}   Total Devices : 5
} Preferred Minor : 0
} 
}     Update Time : Thu Sep 17 16:13:45 2009
}           State : clean
}  Active Devices : 4
} Working Devices : 4
}  Failed Devices : 1
}   Spare Devices : 0
}        Checksum : 20f1deab - correct
}          Events : 1
} 
}          Layout : left-symmetric
}      Chunk Size : 256K
} 
}       Number   Major   Minor   RaidDevice State
} this     0       8       17        0      active sync   /dev/sdb1
} 
}    0     0       8       17        0      active sync   /dev/sdb1
}    1     1       8       33        1      active sync   /dev/sdc1
}    2     2       8       49        2      active sync   /dev/sdd1
}    3     3       8       65        3      active sync   /dev/sde1
}    4     4       0        0        4      faulty
} 
} 
} 
} On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@xxxxxxxxx> wrote:
} > Before creating the array, did you re-examine the disks with mdadm and
} > made sure of each disk's position in the array?
} >
} > After your recabling, the disk names may have changed again.
} >
} > mdadm --examine /dev/sdb1
} >
} >      Number   Major   Minor   RaidDevice State
} > this     7       8       17        7      active sync   /dev/sdb1
} >
} >   0     0       8      113        0      active sync   /dev/sdh1
} >   1     1       8       97        1      active sync   /dev/sdg1
} >   2     2       0        0        2      faulty removed
} >   3     3       0        0        3      faulty removed
} >   4     4       8       33        4      active sync   /dev/sdc1
} >   5     5       8       65        5      active sync   /dev/sde1
} >   6     6       8       49        6      active sync   /dev/sdd1
} >   7     7       8       17        7      active sync   /dev/sdb1
} >
} > (That's the output of an array I'm working on)
} >
} > Notice the first line: *this* and then the value of RaidDevice. That's
} > the position of the partition in the array. 0 is first, 1 is second,
} > and so on.
} >
} > In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,sdb1
} >
} > On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@xxxxxxxxx> wrote:
} >> I re-cabled the drives so that they show up as the same drive letter
} >> as they were before when in the enclosure.
} >>
} >> I then went ahead and tried your idea of restarting the array. I tried
} >> this first:
} >>
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
} >>
} >> mount -o ro /dev/md0 /mnt/teradata
} >>
} >> /var/log/messages:
} >> -----------------
} >> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
} >> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
} >> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
} >> Sep 17 16:07:09 tera kernel: md: bind<sde1>
} >> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid
} disk 3
} >> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid
} disk 2
} >> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid
} disk 1
} >> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid
} disk 0
} >> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
} >> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with 4
} >> out of 5 devices, algorithm 2
} >> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
} >> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
} >> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
} >> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
} >> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
} >> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
} >> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
} >> ext3_check_descriptors: Block bitmap for group 8064 not in group
} >> (block 532677632)!
} >> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
} >> --------------------------------
} >>
} >>
} >> I then tried a few more permutations of the command:
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
} >>
} >> Every time I changed the order, it would still print the order the
} >> same in the log:
} >>
} >> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
} >> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
} >> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
} >> Sep 17 16:02:52 tera kernel: md: bind<sde1>
} >> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid
} disk 3
} >> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid
} disk 2
} >> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid
} disk 1
} >> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid
} disk 0
} >> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
} >> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with 4
} >> out of 5 devices, algorithm 2
} >> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
} >> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
} >> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
} >> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
} >> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
} >> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
} >>
} >>
} >>
} >> Am I doing something wrong?
} >>
} >>
} >>
} >>
} >> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@xxxxxxxxxxxxxxx>
} wrote:
} >>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
} >>>
} >>>> OK,
} >>>>
} >>>> Let me start off by saying - I panicked.  Rule #1 - don't panic.  I
} >>>> did.  Sorry.
} >>>>
} >>>> I have a RAID 5 array running on Fedora 10.
} >>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
} >>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
} >>>>
} >>>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's a
} >>>> Sil4726 inside the enclosure running to a Sil3132 controller via
} eSATA
} >>>> in the desktop.  I had been running this setup for just over a year.
} >>>> Was working fine.   I just moved into a new home and had my server
} >>>> down for a while  - before I brought it back online, I got a "great
} >>>> idea" to blow out the dust from the enclosure using compressed air.
} >>>> When I finally brought up the array again, I noticed that drives were
} >>>> missing.  Tried re-adding the drives to the array and had some issues
} >>>> - they seemed to get added but after a short time of rebuilding the
} >>>> array, I would get a bunch of HW resets in dmesg and then the array
} >>>> would kick out drives and stop.
} >>>>
} >>> <- much snippage ->
} >>>
} >>>> I popped the drives out of the enclosure and into the actual tower
} >>>> case and connected each of them to its own SATA port.  The HW resets
} >>>> seemed to go away, but I couldn't get the array to come back online.
} >>>>  Then I did the stupid panic (following someone's advice I shouldn't
} >>>> have).
} >>>>
} >>>> thinking I should just re-create the array, I did:
} >>>>
} >>>> mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[b-f]1
} >>>>
} >>>> Stupid me again - ignores the warning that it belongs to an array
} >>>> already.  I let it build for a minute or so and then tried to mount
} it
} >>>> while rebuilding... and got error messages:
} >>>>
} >>>> EXT3-fs: unable to read superblock
} >>>> EXT3-fs: md0: couldn't mount because of unsupported optional features
} >>>> (3fd18e00).
} >>>>
} >>>> Now - I'm at a loss.  I'm afraid to do anything else.   I've been
} >>>> viewing the FAQ and I have a few ideas, but I'm just more freaked.
}  Is
} >>>> there any hope?  What should I do next without causing more trouble?
} >>>>
} >>> Looking at the mdadm output, there's a couple of possible errors.
} >>> Firstly, your newly created array has a different chunksize than your
} >>> original one.  Secondly, the drives may be in the wrong order.  In
} >>> either case, providing you don't _actually_ have any faulty drives,
} then
} >>> it should be (mostly) recoverable.
} >>>
} >>> Given the order you specified the drives in the create, sdf1 will be
} the
} >>> partition that's been trashed by the rebuild, so you'll want to leave
} >>> that out altogether for now.
} >>>
} >>> You need to try to recreate the array with the correct chunk size and
} >>> with the remaining drives in different orders, running a read-only
} >>> filesystem check each time until you find the correct order.
} >>>
} >>> So start with:
} >>>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
} >>>
} >>> Then repeat for every possible order of the four disks and "missing",
} >>> stopping the array each time if the mount fails.
} >>>
} >>> When you've finally found the correct order, you can re-add sdf1 to
} get
} >>> the array back to normal.
} >>>
} >>> HTH,
} >>>    Robin
} >>> --
} >>>     ___
} >>>    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
} >>>   / / )      | Little Jim says ....                            |
} >>>  // !!       |      "He fallen in de water !!"                 |
} >>>
} >>
} >>
} >>
} >> --
} >> -tim
} >> --
} >> To unsubscribe from this list: send the line "unsubscribe linux-raid"
} in
} >> the body of a message to majordomo@xxxxxxxxxxxxxxx
} >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
} >>
} >
} >
} >
} > --
} >       Majed B.
} >
} 
} 
} 
} --
} -tim
} --
} To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} the body of a message to majordomo@xxxxxxxxxxxxxxx
} More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html