Re: RAID 5 array recovery - two drives errors in external enclosure

Tim Bostrom <tbostrom@xxxxxxxxx> · Thu, 17 Sep 2009 20:50:21 -0700

I just noticed that my bootable flag is set on two of the disks.
Would that cause any issue?

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000e1d5a

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1      121601   976760001   fd  Linux raid autodetect

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x323eeffc

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1      121601   976760001   fd  Linux raid autodetect

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xd98df0ac

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1      121601   976760001   fd  Linux raid autodetect

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1      121601   976760001   fd  Linux raid autodetect

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0004c8a2

   Device Boot      Start         End      Blocks   Id  System
/dev/sdf1   *           1      121601   976760001   fd  Linux raid autodetect
[root@tera tbostrom]#

On Thu, Sep 17, 2009 at 6:31 PM, Guy Watkins
<linux-raid@xxxxxxxxxxxxxxxx> wrote:
> It is the way you list the drives.  Look at this command:
> # echo /dev/sd[bdce]1
> /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>
> Notice the output is not in the same order as in the command.  You should
> list each disk in the order you want.  Like this:
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> missing
>
> I hope this helps.
>
> } -----Original Message-----
> } From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> } owner@xxxxxxxxxxxxxxx] On Behalf Of Tim Bostrom
> } Sent: Thursday, September 17, 2009 7:55 PM
> } To: linux-raid
> } Subject: Re: RAID 5 array recovery - two drives errors in external
> } enclosure
> }
> } It's still showing the order that you had previously posted:  [bcde]
> } (see log below)
> }
> } It appears that trying different permutations isn't yielding any
> } change.  I haven't tried every permutation, but are these commands
> } supposed to yield different effects?  They seem to always build the
> } array as [bcde] no matter what.  Or should I be swapping around the
> } cables on the drives?
> }
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
> }
> }
> } -Tim
> }
> } [root@tera ~]# mdadm --examine /dev/sdb1
> } /dev/sdb1:
> }           Magic : a92b4efc
> }         Version : 0.90.00
> }            UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
> }   Creation Time : Thu Sep 17 16:13:45 2009
> }      Raid Level : raid5
> }   Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
> }      Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
> }    Raid Devices : 5
> }   Total Devices : 5
> } Preferred Minor : 0
> }
> }     Update Time : Thu Sep 17 16:13:45 2009
> }           State : clean
> }  Active Devices : 4
> } Working Devices : 4
> }  Failed Devices : 1
> }   Spare Devices : 0
> }        Checksum : 20f1deab - correct
> }          Events : 1
> }
> }          Layout : left-symmetric
> }      Chunk Size : 256K
> }
> }       Number   Major   Minor   RaidDevice State
> } this     0       8       17        0      active sync   /dev/sdb1
> }
> }    0     0       8       17        0      active sync   /dev/sdb1
> }    1     1       8       33        1      active sync   /dev/sdc1
> }    2     2       8       49        2      active sync   /dev/sdd1
> }    3     3       8       65        3      active sync   /dev/sde1
> }    4     4       0        0        4      faulty
> }
> }
> }
> } On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@xxxxxxxxx> wrote:
> } > Before creating the array, did you re-examine the disks with mdadm and
> } > made sure of each disk's position in the array?
> } >
> } > After your recabling, the disk names may have changed again.
> } >
> } > mdadm --examine /dev/sdb1
> } >
> } >      Number   Major   Minor   RaidDevice State
> } > this     7       8       17        7      active sync   /dev/sdb1
> } >
> } >   0     0       8      113        0      active sync   /dev/sdh1
> } >   1     1       8       97        1      active sync   /dev/sdg1
> } >   2     2       0        0        2      faulty removed
> } >   3     3       0        0        3      faulty removed
> } >   4     4       8       33        4      active sync   /dev/sdc1
> } >   5     5       8       65        5      active sync   /dev/sde1
> } >   6     6       8       49        6      active sync   /dev/sdd1
> } >   7     7       8       17        7      active sync   /dev/sdb1
> } >
> } > (That's the output of an array I'm working on)
> } >
> } > Notice the first line: *this* and then the value of RaidDevice. That's
> } > the position of the partition in the array. 0 is first, 1 is second,
> } > and so on.
> } >
> } > In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,sdb1
> } >
> } > On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@xxxxxxxxx> wrote:
> } >> I re-cabled the drives so that they show up as the same drive letter
> } >> as they were before when in the enclosure.
> } >>
> } >> I then went ahead and tried your idea of restarting the array. I tried
> } >> this first:
> } >>
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
> } >>
> } >> mount -o ro /dev/md0 /mnt/teradata
> } >>
> } >> /var/log/messages:
> } >> -----------------
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sde1>
> } >> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid
> } disk 3
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid
> } disk 2
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid
> } disk 1
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid
> } disk 0
> } >> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
> } >> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with 4
> } >> out of 5 devices, algorithm 2
> } >> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
> } >> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
> } >> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
> } >> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
> } >> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
> } >> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
> } >> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
> } >> ext3_check_descriptors: Block bitmap for group 8064 not in group
> } >> (block 532677632)!
> } >> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
> } >> --------------------------------
> } >>
> } >>
> } >> I then tried a few more permutations of the command:
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
> } >>
> } >> Every time I changed the order, it would still print the order the
> } >> same in the log:
> } >>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sde1>
> } >> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid
> } disk 3
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid
> } disk 2
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid
> } disk 1
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid
> } disk 0
> } >> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
> } >> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with 4
> } >> out of 5 devices, algorithm 2
> } >> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
> } >> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
> } >> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
> } >> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
> } >> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
> } >> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
> } >>
> } >>
> } >>
> } >> Am I doing something wrong?
> } >>
> } >>
> } >>
> } >>
> } >> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@xxxxxxxxxxxxxxx>
> } wrote:
> } >>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
> } >>>
> } >>>> OK,
> } >>>>
> } >>>> Let me start off by saying - I panicked.  Rule #1 - don't panic.  I
> } >>>> did.  Sorry.
> } >>>>
> } >>>> I have a RAID 5 array running on Fedora 10.
> } >>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
> } >>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
> } >>>>
> } >>>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's a
> } >>>> Sil4726 inside the enclosure running to a Sil3132 controller via
> } eSATA
> } >>>> in the desktop.  I had been running this setup for just over a year.
> } >>>> Was working fine.   I just moved into a new home and had my server
> } >>>> down for a while  - before I brought it back online, I got a "great
> } >>>> idea" to blow out the dust from the enclosure using compressed air.
> } >>>> When I finally brought up the array again, I noticed that drives were
> } >>>> missing.  Tried re-adding the drives to the array and had some issues
> } >>>> - they seemed to get added but after a short time of rebuilding the
> } >>>> array, I would get a bunch of HW resets in dmesg and then the array
> } >>>> would kick out drives and stop.
> } >>>>
> } >>> <- much snippage ->
> } >>>
> } >>>> I popped the drives out of the enclosure and into the actual tower
> } >>>> case and connected each of them to its own SATA port.  The HW resets
> } >>>> seemed to go away, but I couldn't get the array to come back online.
> } >>>>  Then I did the stupid panic (following someone's advice I shouldn't
> } >>>> have).
> } >>>>
> } >>>> thinking I should just re-create the array, I did:
> } >>>>
> } >>>> mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[b-f]1
> } >>>>
> } >>>> Stupid me again - ignores the warning that it belongs to an array
> } >>>> already.  I let it build for a minute or so and then tried to mount
> } it
> } >>>> while rebuilding... and got error messages:
> } >>>>
> } >>>> EXT3-fs: unable to read superblock
> } >>>> EXT3-fs: md0: couldn't mount because of unsupported optional features
> } >>>> (3fd18e00).
> } >>>>
> } >>>> Now - I'm at a loss.  I'm afraid to do anything else.   I've been
> } >>>> viewing the FAQ and I have a few ideas, but I'm just more freaked.
> }  Is
> } >>>> there any hope?  What should I do next without causing more trouble?
> } >>>>
> } >>> Looking at the mdadm output, there's a couple of possible errors.
> } >>> Firstly, your newly created array has a different chunksize than your
> } >>> original one.  Secondly, the drives may be in the wrong order.  In
> } >>> either case, providing you don't _actually_ have any faulty drives,
> } then
> } >>> it should be (mostly) recoverable.
> } >>>
> } >>> Given the order you specified the drives in the create, sdf1 will be
> } the
> } >>> partition that's been trashed by the rebuild, so you'll want to leave
> } >>> that out altogether for now.
> } >>>
> } >>> You need to try to recreate the array with the correct chunk size and
> } >>> with the remaining drives in different orders, running a read-only
> } >>> filesystem check each time until you find the correct order.
> } >>>
> } >>> So start with:
> } >>>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
> } >>>
> } >>> Then repeat for every possible order of the four disks and "missing",
> } >>> stopping the array each time if the mount fails.
> } >>>
> } >>> When you've finally found the correct order, you can re-add sdf1 to
> } get
> } >>> the array back to normal.
> } >>>
> } >>> HTH,
> } >>>    Robin
> } >>> --
> } >>>     ___
> } >>>    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
> } >>>   / / )      | Little Jim says ....                            |
> } >>>  // !!       |      "He fallen in de water !!"                 |
> } >>>
> } >>
> } >>
> } >>
> } >> --
> } >> -tim
> } >> --
> } >> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> } in
> } >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> } >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> } >>
> } >
> } >
> } >
> } > --
> } >       Majed B.
> } >
> }
> }
> }
> } --
> } -tim
> } --
> } To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> } the body of a message to majordomo@xxxxxxxxxxxxxxx
> } More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

-- 
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html