On Thu, Dec 17, 2009 at 10:07 AM, Majed B. <majedb@xxxxxxxxx> wrote: > Before you start rebuilding a new array, I suggest you install the > smartmontools package and run smartctl -a /dev/sdx (on each disk) and > make sure that there are no errors reported. > > You might fall into problems if your disks have bad sectors on them. > > If your disks don't have any test logs from before, you should run a > long or offline test to make sure they're fully tested: > smartctl -t offline /dev/sdx > > And you should configure smartd to monitor and run tests periodically. > > On Thu, Dec 17, 2009 at 7:17 PM, Carl Karsten <carl@xxxxxxxxxxxxxxxxx> wrote: >> On Thu, Dec 17, 2009 at 9:40 AM, Majed B. <majedb@xxxxxxxxx> wrote: >>> I'm assuming you ran the command with the 2 external disks added to the array. >>> One question before proceeding: When you removed these 2 externals, >>> were there any changes on the array? Did you add/delete/modify any >>> files or rename them? >> >> shutdown the box, unplugged drives, booted box. >> >>> >>> What do you mean the 2 externals have had mkfs run on them? Is this >>> AFTER you removed the disks from the array? If so, they're useless >>> now. >> >> That's what I figured. >> >>> >>> The names of the disks have changed and their names in the superblock >>> are different than what udev is reporting them: >>> sde now was named sdg >>> sdf is sdf >>> sdb is sdb >>> sdc is sdc >>> sdd is sdd >>> >>> According to the listing above, you have superblock info on: sdb, sdc, >>> sdd, sde, sdf; 5 disks out of 7 -- one of which is a spare. >>> sdb was a spare and according to other disks' info, it didn't resync >>> so it has no useful data to aid in recovery. >>> So you're left with 4 out of 6 disks + 1 spare. >>> >>> You have a chance of running the array in degraded mode using sde, >>> sdc, sdd, sdf, assuming these disks are sane. >>> >>> Try running this command: mdadm -Af /dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdf >> >> mdadm: forcing event count in /dev/sdf(1) from 97276 upto 580158 >> mdadm: /dev/md0 has been started with 4 drives (out of 6). >> >> >>> >>> then check: cat /proc/mdstat >> >> root@dhcp128:~# cat /proc/mdstat >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> [raid4] [raid10] >> md0 : active raid6 sdf[1] sde[5] sdd[3] sdc[2] >> 5860549632 blocks level 6, 64k chunk, algorithm 2 [6/4] [_UUU_U] >> >> unused devices: <none> >> >>> >>> If the remaining disks are sane, it should run the array in degraded >>> mode. Hopefully. >> >> dmesg >> [31828.093953] md: md0 stopped. >> [31838.929607] md: bind<sdc> >> [31838.931455] md: bind<sdd> >> [31838.932073] md: bind<sde> >> [31838.932376] md: bind<sdf> >> [31838.973346] raid5: device sdf operational as raid disk 1 >> [31838.973349] raid5: device sde operational as raid disk 5 >> [31838.973351] raid5: device sdd operational as raid disk 3 >> [31838.973353] raid5: device sdc operational as raid disk 2 >> [31838.973787] raid5: allocated 6307kB for md0 >> [31838.974165] raid5: raid level 6 set md0 active with 4 out of 6 >> devices, algorithm 2 >> [31839.066014] RAID5 conf printout: >> [31839.066016] --- rd:6 wd:4 >> [31839.066018] disk 1, o:1, dev:sdf >> [31839.066020] disk 2, o:1, dev:sdc >> [31839.066022] disk 3, o:1, dev:sdd >> [31839.066024] disk 5, o:1, dev:sde >> [31839.066066] md0: detected capacity change from 0 to 6001202823168 >> [31839.066188] md0: p1 >> >> root@dhcp128:/media# fdisk -l /dev/md0 >> Disk /dev/md0: 6001.2 GB, 6001202823168 bytes >> 255 heads, 63 sectors/track, 729604 cylinders >> Units = cylinders of 16065 * 512 = 8225280 bytes >> Disk identifier: 0x96af0591 >> Device Boot Start End Blocks Id System >> /dev/md0p1 1 182401 1465136001 83 Linux >> >> and now the bad news: >> mount /dev/md0p1 md0p1 >> mount: wrong fs type, bad option, bad superblock on /dev/md0p1 >> >> [32359.038796] raid5: Disk failure on sde, disabling device. >> [32359.038797] raid5: Operation continuing on 3 devices. >> >>> >>> If that doesn't work, I'd say you're better off scrapping & restoring >>> your data back onto a new array rather than waste more time fiddling >>> with superblocks. >> >> Yep. starting that now. >> >> This is exactly what I was expecting - very few things to try (like 1) >> and a very clear pass/fail test. >> >> Thanks for helping me get though this. >> >> >>> >>> On Thu, Dec 17, 2009 at 6:06 PM, Carl Karsten <carl@xxxxxxxxxxxxxxxxx> wrote: >>>> I brought back the 2 externals, which have had mkfs run on them, but >>>> maybe the extra superblocks will help (doubt it, but couldn't hurt) >>>> >>>> root@dhcp128:/media# mdadm -E /dev/sd[a-z] >>>> mdadm: No md superblock detected on /dev/sda. >>>> /dev/sdb: >>>> Magic : a92b4efc >>>> Version : 00.90.00 >>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>> Raid Level : raid6 >>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>> Raid Devices : 6 >>>> Total Devices : 6 >>>> Preferred Minor : 0 >>>> >>>> Update Time : Tue Mar 31 23:08:02 2009 >>>> State : clean >>>> Active Devices : 5 >>>> Working Devices : 6 >>>> Failed Devices : 1 >>>> Spare Devices : 1 >>>> Checksum : a4fbb93a - correct >>>> Events : 8430 >>>> >>>> Chunk Size : 64K >>>> >>>> Number Major Minor RaidDevice State >>>> this 6 8 16 6 spare /dev/sdb >>>> >>>> 0 0 8 0 0 active sync /dev/sda >>>> 1 1 8 64 1 active sync /dev/sde >>>> 2 2 8 32 2 active sync /dev/sdc >>>> 3 3 8 48 3 active sync /dev/sdd >>>> 4 4 0 0 4 faulty removed >>>> 5 5 8 80 5 active sync /dev/sdf >>>> 6 6 8 16 6 spare /dev/sdb >>>> /dev/sdc: >>>> Magic : a92b4efc >>>> Version : 00.90.00 >>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>> Raid Level : raid6 >>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>> Raid Devices : 6 >>>> Total Devices : 4 >>>> Preferred Minor : 0 >>>> >>>> Update Time : Sun Jul 12 11:31:47 2009 >>>> State : clean >>>> Active Devices : 4 >>>> Working Devices : 4 >>>> Failed Devices : 2 >>>> Spare Devices : 0 >>>> Checksum : a59452db - correct >>>> Events : 580158 >>>> >>>> Chunk Size : 64K >>>> >>>> Number Major Minor RaidDevice State >>>> this 2 8 32 2 active sync /dev/sdc >>>> >>>> 0 0 8 0 0 active sync /dev/sda >>>> 1 1 0 0 1 faulty removed >>>> 2 2 8 32 2 active sync /dev/sdc >>>> 3 3 8 48 3 active sync /dev/sdd >>>> 4 4 0 0 4 faulty removed >>>> 5 5 8 96 5 active sync /dev/sdg >>>> /dev/sdd: >>>> Magic : a92b4efc >>>> Version : 00.90.00 >>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>> Raid Level : raid6 >>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>> Raid Devices : 6 >>>> Total Devices : 4 >>>> Preferred Minor : 0 >>>> >>>> Update Time : Sun Jul 12 11:31:47 2009 >>>> State : clean >>>> Active Devices : 4 >>>> Working Devices : 4 >>>> Failed Devices : 2 >>>> Spare Devices : 0 >>>> Checksum : a59452ed - correct >>>> Events : 580158 >>>> >>>> Chunk Size : 64K >>>> >>>> Number Major Minor RaidDevice State >>>> this 3 8 48 3 active sync /dev/sdd >>>> >>>> 0 0 8 0 0 active sync /dev/sda >>>> 1 1 0 0 1 faulty removed >>>> 2 2 8 32 2 active sync /dev/sdc >>>> 3 3 8 48 3 active sync /dev/sdd >>>> 4 4 0 0 4 faulty removed >>>> 5 5 8 96 5 active sync /dev/sdg >>>> /dev/sde: >>>> Magic : a92b4efc >>>> Version : 00.90.00 >>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>> Raid Level : raid6 >>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>> Raid Devices : 6 >>>> Total Devices : 4 >>>> Preferred Minor : 0 >>>> >>>> Update Time : Sun Jul 12 11:31:47 2009 >>>> State : clean >>>> Active Devices : 4 >>>> Working Devices : 4 >>>> Failed Devices : 2 >>>> Spare Devices : 0 >>>> Checksum : a5945321 - correct >>>> Events : 580158 >>>> >>>> Chunk Size : 64K >>>> >>>> Number Major Minor RaidDevice State >>>> this 5 8 96 5 active sync /dev/sdg >>>> >>>> 0 0 8 0 0 active sync /dev/sda >>>> 1 1 0 0 1 faulty removed >>>> 2 2 8 32 2 active sync /dev/sdc >>>> 3 3 8 48 3 active sync /dev/sdd >>>> 4 4 0 0 4 faulty removed >>>> 5 5 8 96 5 active sync /dev/sdg >>>> /dev/sdf: >>>> Magic : a92b4efc >>>> Version : 00.90.00 >>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>> Raid Level : raid6 >>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>> Raid Devices : 6 >>>> Total Devices : 5 >>>> Preferred Minor : 0 >>>> >>>> Update Time : Wed Apr 8 11:13:32 2009 >>>> State : clean >>>> Active Devices : 5 >>>> Working Devices : 5 >>>> Failed Devices : 1 >>>> Spare Devices : 0 >>>> Checksum : a5085415 - correct >>>> Events : 97276 >>>> >>>> Chunk Size : 64K >>>> >>>> Number Major Minor RaidDevice State >>>> this 1 8 80 1 active sync /dev/sdf >>>> >>>> 0 0 8 0 0 active sync /dev/sda >>>> 1 1 8 80 1 active sync /dev/sdf >>>> 2 2 8 32 2 active sync /dev/sdc >>>> 3 3 8 48 3 active sync /dev/sdd >>>> 4 4 0 0 4 faulty removed >>>> 5 5 8 96 5 active sync /dev/sdg >>>> mdadm: No md superblock detected on /dev/sdg. >>>> >>>> >>>> >>>> On Thu, Dec 17, 2009 at 8:39 AM, Majed B. <majedb@xxxxxxxxx> wrote: >>>>> You can't copy and change bytes to identify disks. >>>>> >>>>> To check which disks belong to an array, do this: >>>>> mdadm -E /dev/sd[a-z] >>>>> >>>>> The disks that you get info from belong to the existing array(s). >>>>> >>>>> In the first email you sent you included an examine output for one of >>>>> the disks that listed another disk as a spare (sdb). The output of >>>>> examine should shed more light. >>>>> >>>>> On Thu, Dec 17, 2009 at 5:15 PM, Carl Karsten <carl@xxxxxxxxxxxxxxxxx> wrote: >>>>>> On Thu, Dec 17, 2009 at 4:35 AM, Majed B. <majedb@xxxxxxxxx> wrote: >>>>>>> I have misread the information you've provided, so allow me to correct myself: >>>>>>> >>>>>>> You're running a RAID6 array, with 2 disks lost/failed. Any disk loss >>>>>>> after that will cause data loss since you have no redundancy (2 disks >>>>>>> died). >>>>>> >>>>>> right - but I am not sure if data loss has occurred, where data is the >>>>>> data being stored on the raid, not the raid metadata. >>>>>> >>>>>> My guess is I need to copy the raid superblock from one of the other >>>>>> disks (say sdb), find the byets that identify the disk and change from >>>>>> sdb to sda. >>>>>> >>>>>>> >>>>>>> I believe it's still possible to reassemble the array, but you only >>>>>>> need to remove the MBR. See this page for information: >>>>>>> http://www.cyberciti.biz/faq/linux-how-to-uninstall-grub/ >>>>>>> dd if=/dev/null of=/dev/sdX bs=446 count=1 >>>>>>> >>>>>>> Before proceeding, provide the output of cat /proc/mdstat >>>>>> >>>>>> root@dhcp128:~# cat /proc/mdstat >>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>>>>> [raid4] [raid10] >>>>>> unused devices: <none> >>>>>> >>>>>> >>>>>>> Is the array currently running degraded or is it suspended? >>>>>> >>>>>> um, not running, not sure I would call it suspended. >>>>>> >>>>>>> What happened to the spare disk assigned? >>>>>> >>>>>> I don't understand. >>>>>> >>>>>>> Did it finish resyncing >>>>>>> before you installed grub on the wrong disk? >>>>>> >>>>>> I think so. >>>>>> >>>>>> I am fairly sure I could assemble the array before I installed grub. >>>>>> >>>>>>> >>>>>>> On Thu, Dec 17, 2009 at 8:21 AM, Majed B. <majedb@xxxxxxxxx> wrote: >>>>>>>> If your other disks are sane and you are able to run a degraded array, then >>>>>>>> you can remove grub using dd then re-add the disk to the array. >>>>>>>> >>>>>>>> To clear the first 1MB of the disk: >>>>>>>> dd if=/dev/zero of=/dev/sdx bs=1M count=1 >>>>>>>> Replace sdx with the disk name that has grub. >>>>>>>> >>>>>>>> On Dec 17, 2009 6:53 AM, "Carl Karsten" <carl@xxxxxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> I took over a box that had 1 ide boot drive, 6 sata raid drives (4 >>>>>>>> internal, 2 external.) I believe the 2 externals were redundant, so >>>>>>>> could be removed. so I did, and mkfs-ed them. then I installed >>>>>>>> ubuntu to the ide, and installed grub to sda, which turns out to be >>>>>>>> the first sata. which would be fine if the raid was on sda1, but it >>>>>>>> is on sda, and now the raid wont' assemble. no surprise, and I do >>>>>>>> have a backup of the data spread across 5 external drives. but before >>>>>>>> I abandon the array, I am wondering if I can fix it by recreating >>>>>>>> mdadm's metatdata on sda, given I have sd[bcd] to work with. >>>>>>>> >>>>>>>> any suggestions? >>>>>>>> >>>>>>>> root@dhcp128:~# mdadm --examine /dev/sd[abcd] >>>>>>>> mdadm: No md superblock detected on /dev/sda. >>>>>>>> /dev/sdb: >>>>>>>> Magic : a92b4efc >>>>>>>> Version : 00.90.00 >>>>>>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>>>>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>>>>>> Raid Level : raid6 >>>>>>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>>>>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>>>>>> Raid Devices : 6 >>>>>>>> Total Devices : 6 >>>>>>>> Preferred Minor : 0 >>>>>>>> >>>>>>>> Update Time : Tue Mar 31 23:08:02 2009 >>>>>>>> State : clean >>>>>>>> Active Devices : 5 >>>>>>>> Working Devices : 6 >>>>>>>> Failed Devices : 1 >>>>>>>> Spare Devices : 1 >>>>>>>> Checksum : a4fbb93a - correct >>>>>>>> Events : 8430 >>>>>>>> >>>>>>>> Chunk Size : 64K >>>>>>>> >>>>>>>> Number Major Minor RaidDevice State >>>>>>>> this 6 8 16 6 spare /dev/sdb >>>>>>>> >>>>>>>> 0 0 8 0 0 active sync /dev/sda >>>>>>>> 1 1 8 64 1 active sync /dev/sde >>>>>>>> 2 2 8 32 2 active sync /dev/sdc >>>>>>>> 3 3 8 48 3 active sync /dev/sdd >>>>>>>> 4 4 0 0 4 faulty removed >>>>>>>> 5 5 8 80 5 active sync >>>>>>>> 6 6 8 16 6 spare /dev/sdb >>>>>>>> /dev/sdc: >>>>>>>> Magic : a92b4efc >>>>>>>> Version : 00.90.00 >>>>>>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>>>>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>>>>>> Raid Level : raid6 >>>>>>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>>>>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>>>>>> Raid Devices : 6 >>>>>>>> Total Devices : 4 >>>>>>>> Preferred Minor : 0 >>>>>>>> >>>>>>>> Update Time : Sun Jul 12 11:31:47 2009 >>>>>>>> State : clean >>>>>>>> Active Devices : 4 >>>>>>>> Working Devices : 4 >>>>>>>> Failed Devices : 2 >>>>>>>> Spare Devices : 0 >>>>>>>> Checksum : a59452db - correct >>>>>>>> Events : 580158 >>>>>>>> >>>>>>>> Chunk Size : 64K >>>>>>>> >>>>>>>> Number Major Minor RaidDevice State >>>>>>>> this 2 8 32 2 active sync /dev/sdc >>>>>>>> >>>>>>>> 0 0 8 0 0 active sync /dev/sda >>>>>>>> 1 1 0 0 1 faulty removed >>>>>>>> 2 2 8 32 2 active sync /dev/sdc >>>>>>>> 3 3 8 48 3 active sync /dev/sdd >>>>>>>> 4 4 0 0 4 faulty removed >>>>>>>> 5 5 8 96 5 active sync >>>>>>>> /dev/sdd: >>>>>>>> Magic : a92b4efc >>>>>>>> Version : 00.90.00 >>>>>>>> UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b >>>>>>>> Creation Time : Wed Mar 25 21:04:08 2009 >>>>>>>> Raid Level : raid6 >>>>>>>> Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) >>>>>>>> Array Size : 5860549632 (5589.06 GiB 6001.20 GB) >>>>>>>> Raid Devices : 6 >>>>>>>> Total Devices : 4 >>>>>>>> Preferred Minor : 0 >>>>>>>> >>>>>>>> Update Time : Sun Jul 12 11:31:47 2009 >>>>>>>> State : clean >>>>>>>> Active Devices : 4 >>>>>>>> Working Devices : 4 >>>>>>>> Failed Devices : 2 >>>>>>>> Spare Devices : 0 >>>>>>>> Checksum : a59452ed - correct >>>>>>>> Events : 580158 >>>>>>>> >>>>>>>> Chunk Size : 64K >>>>>>>> >>>>>>>> Number Major Minor RaidDevice State >>>>>>>> this 3 8 48 3 active sync /dev/sdd >>>>>>>> >>>>>>>> 0 0 8 0 0 active sync /dev/sda >>>>>>>> 1 1 0 0 1 faulty removed >>>>>>>> 2 2 8 32 2 active sync /dev/sdc >>>>>>>> 3 3 8 48 3 active sync /dev/sdd >>>>>>>> 4 4 0 0 4 faulty removed >>>>>>>> 5 5 8 96 5 active sync >>>>>>>> >>>>>>>> -- >>>>>>>> Carl K >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Majed B. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Carl K >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Majed B. >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Carl K >>>> >>> >>> >>> >>> -- >>> Majed B. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> >> >> >> -- >> Carl K >> > > > > -- > Majed B. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Also, since you're starting clean, you might find it faster to zero-fill the devices before array creation and then at creation time pass --assume-clean to mdadm. That is supposed to work with raid 5 at least, but no one mentioned it failing with raid 6. I just ran a test on 6 loopmounted 1GB sparse (all zeros) files created in to a raid 6 array. Zeroing does work with --assume-clean; you can think of it as pre-filling the drives with a known array configuration. # seting up the loops.... # mdadm /dev/md0 --create -l6 --assume-clean -n6 /dev/loop[0-5] ; echo check > /sys/block/md0/md/sync_action; watch cat /sys/block/md0/md/mismatch_cnt /proc/mdstat -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html