Re: RAID 5 array recovery - two drives errors in external enclosure

Tim Bostrom <tbostrom@xxxxxxxxx> · Thu, 17 Sep 2009 15:23:51 -0700

OK - I was just about to ask how you both knew that the array was out of order.

Thank you again.

-Tim

On Thu, Sep 17, 2009 at 3:11 PM, Majed B. <majedb@xxxxxxxxx> wrote:
> If you run mdadm --examine /dev/sda you'll be able to see the disks'
> order in the array (and the position of the disk you're
> querying/examining). The faulty one is previously known as sdf. You
> can find its new name by running --examine on all disks, and the one
> that shows that all disks are healthy, is sdf.
>
> On Fri, Sep 18, 2009 at 12:51 AM, Tim Bostrom <tbostrom@xxxxxxxxx> wrote:
>> I think the direct SATA connections ended up making them get reversed.
>> sdb = sdf now
>> sdc = sde now
>> ..... I think....
>>
>> I labeled the drives as I pulled them out of the enclosure... I'll
>> make sure they match up and then try your suggestions.  I just now
>> noticed the chunk size issue as well.  <ugh>
>>
>>
>> -Tim
>>
>>
>> On Thu, Sep 17, 2009 at 2:35 PM, Majed B. <majedb@xxxxxxxxx> wrote:
>>> Looking at your initial examine output, seems like the proper order is: bdce.
>>>
>>> If the hardware resets have gone after plugging into a normal PC case,
>>> with different SATA cables, then I'd say the cables in your external
>>> enclosure might be the suspect here.
>>>
>>> As Robin said, make sure you have the disks in the proper original
>>> order as they were previously and that the chunksize is the same as
>>> before.
>>>
>>> This should do it: mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
>>> (notice the order)
>>>
>>> On Fri, Sep 18, 2009 at 12:22 AM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
>>>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>>>
>>>>> OK,
>>>>>
>>>>> Let me start off by saying - I panicked.  Rule #1 - don't panic.  I
>>>>> did.  Sorry.
>>>>>
>>>>> I have a RAID 5 array running on Fedora 10.
>>>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
>>>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>>>
>>>>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's a
>>>>> Sil4726 inside the enclosure running to a Sil3132 controller via eSATA
>>>>> in the desktop.  I had been running this setup for just over a year.
>>>>> Was working fine.   I just moved into a new home and had my server
>>>>> down for a while  - before I brought it back online, I got a "great
>>>>> idea" to blow out the dust from the enclosure using compressed air.
>>>>> When I finally brought up the array again, I noticed that drives were
>>>>> missing.  Tried re-adding the drives to the array and had some issues
>>>>> - they seemed to get added but after a short time of rebuilding the
>>>>> array, I would get a bunch of HW resets in dmesg and then the array
>>>>> would kick out drives and stop.
>>>>>
>>>> <- much snippage ->
>>>>
>>>>> I popped the drives out of the enclosure and into the actual tower
>>>>> case and connected each of them to its own SATA port.  The HW resets
>>>>> seemed to go away, but I couldn't get the array to come back online.
>>>>>  Then I did the stupid panic (following someone's advice I shouldn't
>>>>> have).
>>>>>
>>>>> thinking I should just re-create the array, I did:
>>>>>
>>>>> mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[b-f]1
>>>>>
>>>>> Stupid me again - ignores the warning that it belongs to an array
>>>>> already.  I let it build for a minute or so and then tried to mount it
>>>>> while rebuilding... and got error messages:
>>>>>
>>>>> EXT3-fs: unable to read superblock
>>>>> EXT3-fs: md0: couldn't mount because of unsupported optional features
>>>>> (3fd18e00).
>>>>>
>>>>> Now - I'm at a loss.  I'm afraid to do anything else.   I've been
>>>>> viewing the FAQ and I have a few ideas, but I'm just more freaked.  Is
>>>>> there any hope?  What should I do next without causing more trouble?
>>>>>
>>>> Looking at the mdadm output, there's a couple of possible errors.
>>>> Firstly, your newly created array has a different chunksize than your
>>>> original one.  Secondly, the drives may be in the wrong order.  In
>>>> either case, providing you don't _actually_ have any faulty drives, then
>>>> it should be (mostly) recoverable.
>>>>
>>>> Given the order you specified the drives in the create, sdf1 will be the
>>>> partition that's been trashed by the rebuild, so you'll want to leave
>>>> that out altogether for now.
>>>>
>>>> You need to try to recreate the array with the correct chunk size and
>>>> with the remaining drives in different orders, running a read-only
>>>> filesystem check each time until you find the correct order.
>>>>
>>>> So start with:
>>>>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>>>
>>>> Then repeat for every possible order of the four disks and "missing",
>>>> stopping the array each time if the mount fails.
>>>>
>>>> When you've finally found the correct order, you can re-add sdf1 to get
>>>> the array back to normal.
>>>>
>>>> HTH,
>>>>    Robin
>>>> --
>>>>     ___
>>>>    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
>>>>   / / )      | Little Jim says ....                            |
>>>>  // !!       |      "He fallen in de water !!"                 |
>>>>
>>>
>>>
>>>
>>> --
>>>       Majed B.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> -tim
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
>       Majed B.
>

-- 
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html