Re: RAID 5 array recovery - two drives errors in external enclosure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's still showing the order that you had previously posted:  [bcde]
(see log below)

It appears that trying different permutations isn't yielding any
change.  I haven't tried every permutation, but are these commands
supposed to yield different effects?  They seem to always build the
array as [bcde] no matter what.  Or should I be swapping around the
cables on the drives?

>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing


-Tim

[root@tera ~]# mdadm --examine /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
  Creation Time : Thu Sep 17 16:13:45 2009
     Raid Level : raid5
  Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
     Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

    Update Time : Thu Sep 17 16:13:45 2009
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 20f1deab - correct
         Events : 1

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       65        3      active sync   /dev/sde1
   4     4       0        0        4      faulty



On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@xxxxxxxxx> wrote:
> Before creating the array, did you re-examine the disks with mdadm and
> made sure of each disk's position in the array?
>
> After your recabling, the disk names may have changed again.
>
> mdadm --examine /dev/sdb1
>
>      Number   Major   Minor   RaidDevice State
> this     7       8       17        7      active sync   /dev/sdb1
>
>   0     0       8      113        0      active sync   /dev/sdh1
>   1     1       8       97        1      active sync   /dev/sdg1
>   2     2       0        0        2      faulty removed
>   3     3       0        0        3      faulty removed
>   4     4       8       33        4      active sync   /dev/sdc1
>   5     5       8       65        5      active sync   /dev/sde1
>   6     6       8       49        6      active sync   /dev/sdd1
>   7     7       8       17        7      active sync   /dev/sdb1
>
> (That's the output of an array I'm working on)
>
> Notice the first line: *this* and then the value of RaidDevice. That's
> the position of the partition in the array. 0 is first, 1 is second,
> and so on.
>
> In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,sdb1
>
> On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@xxxxxxxxx> wrote:
>> I re-cabled the drives so that they show up as the same drive letter
>> as they were before when in the enclosure.
>>
>> I then went ahead and tried your idea of restarting the array. I tried
>> this first:
>>
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>
>> mount -o ro /dev/md0 /mnt/teradata
>>
>> /var/log/messages:
>> -----------------
>> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
>> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
>> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
>> Sep 17 16:07:09 tera kernel: md: bind<sde1>
>> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid disk 3
>> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid disk 2
>> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid disk 1
>> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid disk 0
>> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
>> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with 4
>> out of 5 devices, algorithm 2
>> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
>> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
>> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
>> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
>> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
>> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
>> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
>> ext3_check_descriptors: Block bitmap for group 8064 not in group
>> (block 532677632)!
>> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
>> --------------------------------
>>
>>
>> I then tried a few more permutations of the command:
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
>>
>> Every time I changed the order, it would still print the order the
>> same in the log:
>>
>> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
>> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
>> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
>> Sep 17 16:02:52 tera kernel: md: bind<sde1>
>> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid disk 3
>> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid disk 2
>> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid disk 1
>> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid disk 0
>> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
>> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with 4
>> out of 5 devices, algorithm 2
>> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
>> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
>> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
>> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
>> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
>> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
>>
>>
>>
>> Am I doing something wrong?
>>
>>
>>
>>
>> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
>>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>>
>>>> OK,
>>>>
>>>> Let me start off by saying - I panicked.  Rule #1 - don't panic.  I
>>>> did.  Sorry.
>>>>
>>>> I have a RAID 5 array running on Fedora 10.
>>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
>>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>>
>>>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's a
>>>> Sil4726 inside the enclosure running to a Sil3132 controller via eSATA
>>>> in the desktop.  I had been running this setup for just over a year.
>>>> Was working fine.   I just moved into a new home and had my server
>>>> down for a while  - before I brought it back online, I got a "great
>>>> idea" to blow out the dust from the enclosure using compressed air.
>>>> When I finally brought up the array again, I noticed that drives were
>>>> missing.  Tried re-adding the drives to the array and had some issues
>>>> - they seemed to get added but after a short time of rebuilding the
>>>> array, I would get a bunch of HW resets in dmesg and then the array
>>>> would kick out drives and stop.
>>>>
>>> <- much snippage ->
>>>
>>>> I popped the drives out of the enclosure and into the actual tower
>>>> case and connected each of them to its own SATA port.  The HW resets
>>>> seemed to go away, but I couldn't get the array to come back online.
>>>>  Then I did the stupid panic (following someone's advice I shouldn't
>>>> have).
>>>>
>>>> thinking I should just re-create the array, I did:
>>>>
>>>> mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[b-f]1
>>>>
>>>> Stupid me again - ignores the warning that it belongs to an array
>>>> already.  I let it build for a minute or so and then tried to mount it
>>>> while rebuilding... and got error messages:
>>>>
>>>> EXT3-fs: unable to read superblock
>>>> EXT3-fs: md0: couldn't mount because of unsupported optional features
>>>> (3fd18e00).
>>>>
>>>> Now - I'm at a loss.  I'm afraid to do anything else.   I've been
>>>> viewing the FAQ and I have a few ideas, but I'm just more freaked.  Is
>>>> there any hope?  What should I do next without causing more trouble?
>>>>
>>> Looking at the mdadm output, there's a couple of possible errors.
>>> Firstly, your newly created array has a different chunksize than your
>>> original one.  Secondly, the drives may be in the wrong order.  In
>>> either case, providing you don't _actually_ have any faulty drives, then
>>> it should be (mostly) recoverable.
>>>
>>> Given the order you specified the drives in the create, sdf1 will be the
>>> partition that's been trashed by the rebuild, so you'll want to leave
>>> that out altogether for now.
>>>
>>> You need to try to recreate the array with the correct chunk size and
>>> with the remaining drives in different orders, running a read-only
>>> filesystem check each time until you find the correct order.
>>>
>>> So start with:
>>>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>>
>>> Then repeat for every possible order of the four disks and "missing",
>>> stopping the array each time if the mount fails.
>>>
>>> When you've finally found the correct order, you can re-add sdf1 to get
>>> the array back to normal.
>>>
>>> HTH,
>>>    Robin
>>> --
>>>     ___
>>>    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
>>>   / / )      | Little Jim says ....                            |
>>>  // !!       |      "He fallen in de water !!"                 |
>>>
>>
>>
>>
>> --
>> -tim
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
>       Majed B.
>



-- 
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux