I just noticed that my bootable flag is set on two of the disks. Would that cause any issue? Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000e1d5a Device Boot Start End Blocks Id System /dev/sdb1 * 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x323eeffc Device Boot Start End Blocks Id System /dev/sdc1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xd98df0ac Device Boot Start End Blocks Id System /dev/sdd1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sde: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sde1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0004c8a2 Device Boot Start End Blocks Id System /dev/sdf1 * 1 121601 976760001 fd Linux raid autodetect [root@tera tbostrom]# On Thu, Sep 17, 2009 at 6:31 PM, Guy Watkins <linux-raid@xxxxxxxxxxxxxxxx> wrote: > It is the way you list the drives. Look at this command: > # echo /dev/sd[bdce]1 > /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > > Notice the output is not in the same order as in the command. You should > list each disk in the order you want. Like this: > mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 > missing > > I hope this helps. > > } -----Original Message----- > } From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- > } owner@xxxxxxxxxxxxxxx] On Behalf Of Tim Bostrom > } Sent: Thursday, September 17, 2009 7:55 PM > } To: linux-raid > } Subject: Re: RAID 5 array recovery - two drives errors in external > } enclosure > } > } It's still showing the order that you had previously posted: [bcde] > } (see log below) > } > } It appears that trying different permutations isn't yielding any > } change. I haven't tried every permutation, but are these commands > } supposed to yield different effects? They seem to always build the > } array as [bcde] no matter what. Or should I be swapping around the > } cables on the drives? > } > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing > } > } > } -Tim > } > } [root@tera ~]# mdadm --examine /dev/sdb1 > } /dev/sdb1: > } Magic : a92b4efc > } Version : 0.90.00 > } UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc > } Creation Time : Thu Sep 17 16:13:45 2009 > } Raid Level : raid5 > } Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > } Array Size : 3907039232 (3726.04 GiB 4000.81 GB) > } Raid Devices : 5 > } Total Devices : 5 > } Preferred Minor : 0 > } > } Update Time : Thu Sep 17 16:13:45 2009 > } State : clean > } Active Devices : 4 > } Working Devices : 4 > } Failed Devices : 1 > } Spare Devices : 0 > } Checksum : 20f1deab - correct > } Events : 1 > } > } Layout : left-symmetric > } Chunk Size : 256K > } > } Number Major Minor RaidDevice State > } this 0 8 17 0 active sync /dev/sdb1 > } > } 0 0 8 17 0 active sync /dev/sdb1 > } 1 1 8 33 1 active sync /dev/sdc1 > } 2 2 8 49 2 active sync /dev/sdd1 > } 3 3 8 65 3 active sync /dev/sde1 > } 4 4 0 0 4 faulty > } > } > } > } On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@xxxxxxxxx> wrote: > } > Before creating the array, did you re-examine the disks with mdadm and > } > made sure of each disk's position in the array? > } > > } > After your recabling, the disk names may have changed again. > } > > } > mdadm --examine /dev/sdb1 > } > > } > Number Major Minor RaidDevice State > } > this 7 8 17 7 active sync /dev/sdb1 > } > > } > 0 0 8 113 0 active sync /dev/sdh1 > } > 1 1 8 97 1 active sync /dev/sdg1 > } > 2 2 0 0 2 faulty removed > } > 3 3 0 0 3 faulty removed > } > 4 4 8 33 4 active sync /dev/sdc1 > } > 5 5 8 65 5 active sync /dev/sde1 > } > 6 6 8 49 6 active sync /dev/sdd1 > } > 7 7 8 17 7 active sync /dev/sdb1 > } > > } > (That's the output of an array I'm working on) > } > > } > Notice the first line: *this* and then the value of RaidDevice. That's > } > the position of the partition in the array. 0 is first, 1 is second, > } > and so on. > } > > } > In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,sdb1 > } > > } > On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@xxxxxxxxx> wrote: > } >> I re-cabled the drives so that they show up as the same drive letter > } >> as they were before when in the enclosure. > } >> > } >> I then went ahead and tried your idea of restarting the array. I tried > } >> this first: > } >> > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing > } >> > } >> mount -o ro /dev/md0 /mnt/teradata > } >> > } >> /var/log/messages: > } >> ----------------- > } >> Sep 17 16:07:09 tera kernel: md: bind<sdb1> > } >> Sep 17 16:07:09 tera kernel: md: bind<sdc1> > } >> Sep 17 16:07:09 tera kernel: md: bind<sdd1> > } >> Sep 17 16:07:09 tera kernel: md: bind<sde1> > } >> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid > } disk 3 > } >> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid > } disk 2 > } >> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid > } disk 1 > } >> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid > } disk 0 > } >> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0 > } >> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with 4 > } >> out of 5 devices, algorithm 2 > } >> Sep 17 16:07:09 tera kernel: RAID5 conf printout: > } >> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4 > } >> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1 > } >> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1 > } >> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1 > } >> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1 > } >> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0): > } >> ext3_check_descriptors: Block bitmap for group 8064 not in group > } >> (block 532677632)! > } >> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted! > } >> -------------------------------- > } >> > } >> > } >> I then tried a few more permutations of the command: > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing > } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing > } >> > } >> Every time I changed the order, it would still print the order the > } >> same in the log: > } >> > } >> Sep 17 16:02:52 tera kernel: md: bind<sdb1> > } >> Sep 17 16:02:52 tera kernel: md: bind<sdc1> > } >> Sep 17 16:02:52 tera kernel: md: bind<sdd1> > } >> Sep 17 16:02:52 tera kernel: md: bind<sde1> > } >> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid > } disk 3 > } >> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid > } disk 2 > } >> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid > } disk 1 > } >> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid > } disk 0 > } >> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0 > } >> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with 4 > } >> out of 5 devices, algorithm 2 > } >> Sep 17 16:02:52 tera kernel: RAID5 conf printout: > } >> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4 > } >> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1 > } >> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1 > } >> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1 > } >> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1 > } >> > } >> > } >> > } >> Am I doing something wrong? > } >> > } >> > } >> > } >> > } >> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> > } wrote: > } >>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote: > } >>> > } >>>> OK, > } >>>> > } >>>> Let me start off by saying - I panicked. Rule #1 - don't panic. I > } >>>> did. Sorry. > } >>>> > } >>>> I have a RAID 5 array running on Fedora 10. > } >>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon > } >>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux) > } >>>> > } >>>> 5 drives in an external enclosure (AMS eSATA Venus T5). It's a > } >>>> Sil4726 inside the enclosure running to a Sil3132 controller via > } eSATA > } >>>> in the desktop. I had been running this setup for just over a year. > } >>>> Was working fine. I just moved into a new home and had my server > } >>>> down for a while - before I brought it back online, I got a "great > } >>>> idea" to blow out the dust from the enclosure using compressed air. > } >>>> When I finally brought up the array again, I noticed that drives were > } >>>> missing. Tried re-adding the drives to the array and had some issues > } >>>> - they seemed to get added but after a short time of rebuilding the > } >>>> array, I would get a bunch of HW resets in dmesg and then the array > } >>>> would kick out drives and stop. > } >>>> > } >>> <- much snippage -> > } >>> > } >>>> I popped the drives out of the enclosure and into the actual tower > } >>>> case and connected each of them to its own SATA port. The HW resets > } >>>> seemed to go away, but I couldn't get the array to come back online. > } >>>> Then I did the stupid panic (following someone's advice I shouldn't > } >>>> have). > } >>>> > } >>>> thinking I should just re-create the array, I did: > } >>>> > } >>>> mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[b-f]1 > } >>>> > } >>>> Stupid me again - ignores the warning that it belongs to an array > } >>>> already. I let it build for a minute or so and then tried to mount > } it > } >>>> while rebuilding... and got error messages: > } >>>> > } >>>> EXT3-fs: unable to read superblock > } >>>> EXT3-fs: md0: couldn't mount because of unsupported optional features > } >>>> (3fd18e00). > } >>>> > } >>>> Now - I'm at a loss. I'm afraid to do anything else. I've been > } >>>> viewing the FAQ and I have a few ideas, but I'm just more freaked. > } Is > } >>>> there any hope? What should I do next without causing more trouble? > } >>>> > } >>> Looking at the mdadm output, there's a couple of possible errors. > } >>> Firstly, your newly created array has a different chunksize than your > } >>> original one. Secondly, the drives may be in the wrong order. In > } >>> either case, providing you don't _actually_ have any faulty drives, > } then > } >>> it should be (mostly) recoverable. > } >>> > } >>> Given the order you specified the drives in the create, sdf1 will be > } the > } >>> partition that's been trashed by the rebuild, so you'll want to leave > } >>> that out altogether for now. > } >>> > } >>> You need to try to recreate the array with the correct chunk size and > } >>> with the remaining drives in different orders, running a read-only > } >>> filesystem check each time until you find the correct order. > } >>> > } >>> So start with: > } >>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing > } >>> > } >>> Then repeat for every possible order of the four disks and "missing", > } >>> stopping the array each time if the mount fails. > } >>> > } >>> When you've finally found the correct order, you can re-add sdf1 to > } get > } >>> the array back to normal. > } >>> > } >>> HTH, > } >>> Robin > } >>> -- > } >>> ___ > } >>> ( ' } | Robin Hill <robin@xxxxxxxxxxxxxxx> | > } >>> / / ) | Little Jim says .... | > } >>> // !! | "He fallen in de water !!" | > } >>> > } >> > } >> > } >> > } >> -- > } >> -tim > } >> -- > } >> To unsubscribe from this list: send the line "unsubscribe linux-raid" > } in > } >> the body of a message to majordomo@xxxxxxxxxxxxxxx > } >> More majordomo info at http://vger.kernel.org/majordomo-info.html > } >> > } > > } > > } > > } > -- > } > Majed B. > } > > } > } > } > } -- > } -tim > } -- > } To unsubscribe from this list: send the line "unsubscribe linux-raid" in > } the body of a message to majordomo@xxxxxxxxxxxxxxx > } More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- -tim -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html