Re: reconstruct raid superblock

"Majed B." <majedb@xxxxxxxxx> · Thu, 17 Dec 2009 21:07:34 +0300

Before you start rebuilding a new array, I suggest you install the
smartmontools package and run smartctl -a /dev/sdx (on each disk) and
make sure that there are no errors reported.

You might fall into problems if your disks have bad sectors on them.

If your disks don't have any test logs from before, you should run a
long or offline test to make sure they're fully tested:
smartctl -t offline /dev/sdx

And you should configure smartd to monitor and run tests periodically.

On Thu, Dec 17, 2009 at 7:17 PM, Carl Karsten <carl@xxxxxxxxxxxxxxxxx> wrote:
> On Thu, Dec 17, 2009 at 9:40 AM, Majed B. <majedb@xxxxxxxxx> wrote:
>> I'm assuming you ran the command with the 2 external disks added to the array.
>> One question before proceeding: When you removed these 2 externals,
>> were there any changes on the array? Did you add/delete/modify any
>> files or rename them?
>
> shutdown the box, unplugged drives, booted box.
>
>>
>> What do you mean the 2 externals have had mkfs run on them? Is this
>> AFTER you removed the disks from the array? If so, they're useless
>> now.
>
> That's what I figured.
>
>>
>> The names of the disks have changed and their names in the superblock
>> are different than what udev is reporting them:
>> sde now was named sdg
>> sdf is sdf
>> sdb is sdb
>> sdc is sdc
>> sdd is sdd
>>
>> According to the listing above, you have superblock info on: sdb, sdc,
>> sdd, sde, sdf; 5 disks out of 7 -- one of which is a spare.
>> sdb was a spare and according to other disks' info, it didn't resync
>> so it has no useful data to aid in recovery.
>> So you're left with 4 out of 6 disks + 1 spare.
>>
>> You have a chance of running the array in degraded mode using sde,
>> sdc, sdd, sdf, assuming these disks are sane.
>>
>> Try running this command: mdadm -Af /dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdf
>
> mdadm: forcing event count in /dev/sdf(1) from 97276 upto 580158
> mdadm: /dev/md0 has been started with 4 drives (out of 6).
>
>
>>
>> then check: cat /proc/mdstat
>
> root@dhcp128:~# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : active raid6 sdf[1] sde[5] sdd[3] sdc[2]
>      5860549632 blocks level 6, 64k chunk, algorithm 2 [6/4] [_UUU_U]
>
> unused devices: <none>
>
>>
>> If the remaining disks are sane, it should run the array in degraded
>> mode. Hopefully.
>
> dmesg
> [31828.093953] md: md0 stopped.
> [31838.929607] md: bind<sdc>
> [31838.931455] md: bind<sdd>
> [31838.932073] md: bind<sde>
> [31838.932376] md: bind<sdf>
> [31838.973346] raid5: device sdf operational as raid disk 1
> [31838.973349] raid5: device sde operational as raid disk 5
> [31838.973351] raid5: device sdd operational as raid disk 3
> [31838.973353] raid5: device sdc operational as raid disk 2
> [31838.973787] raid5: allocated 6307kB for md0
> [31838.974165] raid5: raid level 6 set md0 active with 4 out of 6
> devices, algorithm 2
> [31839.066014] RAID5 conf printout:
> [31839.066016]  --- rd:6 wd:4
> [31839.066018]  disk 1, o:1, dev:sdf
> [31839.066020]  disk 2, o:1, dev:sdc
> [31839.066022]  disk 3, o:1, dev:sdd
> [31839.066024]  disk 5, o:1, dev:sde
> [31839.066066] md0: detected capacity change from 0 to 6001202823168
> [31839.066188]  md0: p1
>
> root@dhcp128:/media# fdisk -l /dev/md0
> Disk /dev/md0: 6001.2 GB, 6001202823168 bytes
> 255 heads, 63 sectors/track, 729604 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x96af0591
>    Device Boot      Start         End      Blocks   Id  System
> /dev/md0p1               1      182401  1465136001   83  Linux
>
> and now the bad news:
> mount /dev/md0p1 md0p1
> mount: wrong fs type, bad option, bad superblock on /dev/md0p1
>
> [32359.038796] raid5: Disk failure on sde, disabling device.
> [32359.038797] raid5: Operation continuing on 3 devices.
>
>>
>> If that doesn't work, I'd say you're better off scrapping & restoring
>> your data back onto a new array rather than waste more time fiddling
>> with superblocks.
>
> Yep.  starting that now.
>
> This is exactly what I was expecting - very few things to try (like 1)
> and a very clear pass/fail test.
>
> Thanks for helping me get though this.
>
>
>>
>> On Thu, Dec 17, 2009 at 6:06 PM, Carl Karsten <carl@xxxxxxxxxxxxxxxxx> wrote:
>>> I brought back the 2 externals, which have had mkfs run on them, but
>>> maybe the extra superblocks will help (doubt it, but couldn't hurt)
>>>
>>> root@dhcp128:/media# mdadm -E /dev/sd[a-z]
>>> mdadm: No md superblock detected on /dev/sda.
>>> /dev/sdb:
>>>          Magic : a92b4efc
>>>        Version : 00.90.00
>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>     Raid Level : raid6
>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>   Raid Devices : 6
>>>  Total Devices : 6
>>> Preferred Minor : 0
>>>
>>>    Update Time : Tue Mar 31 23:08:02 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 6
>>>  Failed Devices : 1
>>>  Spare Devices : 1
>>>       Checksum : a4fbb93a - correct
>>>         Events : 8430
>>>
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     6       8       16        6      spare   /dev/sdb
>>>
>>>   0     0       8        0        0      active sync   /dev/sda
>>>   1     1       8       64        1      active sync   /dev/sde
>>>   2     2       8       32        2      active sync   /dev/sdc
>>>   3     3       8       48        3      active sync   /dev/sdd
>>>   4     4       0        0        4      faulty removed
>>>   5     5       8       80        5      active sync   /dev/sdf
>>>   6     6       8       16        6      spare   /dev/sdb
>>> /dev/sdc:
>>>          Magic : a92b4efc
>>>        Version : 00.90.00
>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>     Raid Level : raid6
>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>   Raid Devices : 6
>>>  Total Devices : 4
>>> Preferred Minor : 0
>>>
>>>    Update Time : Sun Jul 12 11:31:47 2009
>>>          State : clean
>>>  Active Devices : 4
>>> Working Devices : 4
>>>  Failed Devices : 2
>>>  Spare Devices : 0
>>>       Checksum : a59452db - correct
>>>         Events : 580158
>>>
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     2       8       32        2      active sync   /dev/sdc
>>>
>>>   0     0       8        0        0      active sync   /dev/sda
>>>   1     1       0        0        1      faulty removed
>>>   2     2       8       32        2      active sync   /dev/sdc
>>>   3     3       8       48        3      active sync   /dev/sdd
>>>   4     4       0        0        4      faulty removed
>>>   5     5       8       96        5      active sync   /dev/sdg
>>> /dev/sdd:
>>>          Magic : a92b4efc
>>>        Version : 00.90.00
>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>     Raid Level : raid6
>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>   Raid Devices : 6
>>>  Total Devices : 4
>>> Preferred Minor : 0
>>>
>>>    Update Time : Sun Jul 12 11:31:47 2009
>>>          State : clean
>>>  Active Devices : 4
>>> Working Devices : 4
>>>  Failed Devices : 2
>>>  Spare Devices : 0
>>>       Checksum : a59452ed - correct
>>>         Events : 580158
>>>
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     3       8       48        3      active sync   /dev/sdd
>>>
>>>   0     0       8        0        0      active sync   /dev/sda
>>>   1     1       0        0        1      faulty removed
>>>   2     2       8       32        2      active sync   /dev/sdc
>>>   3     3       8       48        3      active sync   /dev/sdd
>>>   4     4       0        0        4      faulty removed
>>>   5     5       8       96        5      active sync   /dev/sdg
>>> /dev/sde:
>>>          Magic : a92b4efc
>>>        Version : 00.90.00
>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>     Raid Level : raid6
>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>   Raid Devices : 6
>>>  Total Devices : 4
>>> Preferred Minor : 0
>>>
>>>    Update Time : Sun Jul 12 11:31:47 2009
>>>          State : clean
>>>  Active Devices : 4
>>> Working Devices : 4
>>>  Failed Devices : 2
>>>  Spare Devices : 0
>>>       Checksum : a5945321 - correct
>>>         Events : 580158
>>>
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     5       8       96        5      active sync   /dev/sdg
>>>
>>>   0     0       8        0        0      active sync   /dev/sda
>>>   1     1       0        0        1      faulty removed
>>>   2     2       8       32        2      active sync   /dev/sdc
>>>   3     3       8       48        3      active sync   /dev/sdd
>>>   4     4       0        0        4      faulty removed
>>>   5     5       8       96        5      active sync   /dev/sdg
>>> /dev/sdf:
>>>          Magic : a92b4efc
>>>        Version : 00.90.00
>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>     Raid Level : raid6
>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>   Raid Devices : 6
>>>  Total Devices : 5
>>> Preferred Minor : 0
>>>
>>>    Update Time : Wed Apr  8 11:13:32 2009
>>>          State : clean
>>>  Active Devices : 5
>>> Working Devices : 5
>>>  Failed Devices : 1
>>>  Spare Devices : 0
>>>       Checksum : a5085415 - correct
>>>         Events : 97276
>>>
>>>     Chunk Size : 64K
>>>
>>>      Number   Major   Minor   RaidDevice State
>>> this     1       8       80        1      active sync   /dev/sdf
>>>
>>>   0     0       8        0        0      active sync   /dev/sda
>>>   1     1       8       80        1      active sync   /dev/sdf
>>>   2     2       8       32        2      active sync   /dev/sdc
>>>   3     3       8       48        3      active sync   /dev/sdd
>>>   4     4       0        0        4      faulty removed
>>>   5     5       8       96        5      active sync   /dev/sdg
>>> mdadm: No md superblock detected on /dev/sdg.
>>>
>>>
>>>
>>> On Thu, Dec 17, 2009 at 8:39 AM, Majed B. <majedb@xxxxxxxxx> wrote:
>>>> You can't copy and change bytes to identify disks.
>>>>
>>>> To check which disks belong to an array, do this:
>>>> mdadm -E /dev/sd[a-z]
>>>>
>>>> The disks that you get info from belong to the existing array(s).
>>>>
>>>> In the first email you sent you included an examine output for one of
>>>> the disks that listed another disk as a spare (sdb). The output of
>>>> examine should shed more light.
>>>>
>>>> On Thu, Dec 17, 2009 at 5:15 PM, Carl Karsten <carl@xxxxxxxxxxxxxxxxx> wrote:
>>>>> On Thu, Dec 17, 2009 at 4:35 AM, Majed B. <majedb@xxxxxxxxx> wrote:
>>>>>> I have misread the information you've provided, so allow me to correct myself:
>>>>>>
>>>>>> You're running a RAID6 array, with 2 disks lost/failed. Any disk loss
>>>>>> after that will cause data loss since you have no redundancy (2 disks
>>>>>> died).
>>>>>
>>>>> right - but I am not sure if data loss has occurred, where data is the
>>>>> data being stored on the raid, not the raid metadata.
>>>>>
>>>>> My guess is I need to copy the raid superblock from one of the other
>>>>> disks (say sdb), find the byets that identify the disk and change from
>>>>> sdb to sda.
>>>>>
>>>>>>
>>>>>> I believe it's still possible to reassemble the array, but you only
>>>>>> need to remove the MBR. See this page for information:
>>>>>> http://www.cyberciti.biz/faq/linux-how-to-uninstall-grub/
>>>>>> dd if=/dev/null of=/dev/sdX bs=446 count=1
>>>>>>
>>>>>> Before proceeding, provide the output of cat /proc/mdstat
>>>>>
>>>>> root@dhcp128:~# cat /proc/mdstat
>>>>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>>>>> [raid4] [raid10]
>>>>> unused devices: <none>
>>>>>
>>>>>
>>>>>> Is the array currently running degraded or is it suspended?
>>>>>
>>>>> um, not running, not sure I would call it suspended.
>>>>>
>>>>>> What happened to the spare disk assigned?
>>>>>
>>>>> I don't understand.
>>>>>
>>>>>> Did it finish resyncing
>>>>>> before you installed grub on the wrong disk?
>>>>>
>>>>> I think so.
>>>>>
>>>>> I am fairly sure I could assemble the array before I installed grub.
>>>>>
>>>>>>
>>>>>> On Thu, Dec 17, 2009 at 8:21 AM, Majed B. <majedb@xxxxxxxxx> wrote:
>>>>>>> If your other disks are sane and you are able to run a degraded array,  then
>>>>>>> you can remove grub using dd then re-add the disk to the array.
>>>>>>>
>>>>>>> To clear the first 1MB of the disk:
>>>>>>> dd if=/dev/zero of=/dev/sdx bs=1M count=1
>>>>>>> Replace sdx with the disk name that has grub.
>>>>>>>
>>>>>>> On Dec 17, 2009 6:53 AM, "Carl Karsten" <carl@xxxxxxxxxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> I took over a box that had 1 ide boot drive, 6 sata raid drives (4
>>>>>>> internal, 2 external.)  I believe the 2 externals were redundant, so
>>>>>>> could be removed.  so I did, and mkfs-ed them.  then I installed
>>>>>>> ubuntu to the ide, and installed grub to sda, which turns out to be
>>>>>>> the first sata.  which would be fine if the raid was on sda1, but it
>>>>>>> is on sda, and now the raid wont' assemble.  no surprise, and I do
>>>>>>> have a backup of the data spread across 5 external drives.  but before
>>>>>>> I  abandon the array, I am wondering if I can fix it by recreating
>>>>>>> mdadm's metatdata on sda, given I have sd[bcd] to work with.
>>>>>>>
>>>>>>> any suggestions?
>>>>>>>
>>>>>>> root@dhcp128:~# mdadm --examine /dev/sd[abcd]
>>>>>>> mdadm: No md superblock detected on /dev/sda.
>>>>>>> /dev/sdb:
>>>>>>>          Magic : a92b4efc
>>>>>>>        Version : 00.90.00
>>>>>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>>>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>>>>>     Raid Level : raid6
>>>>>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>>>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>>>>>   Raid Devices : 6
>>>>>>>  Total Devices : 6
>>>>>>> Preferred Minor : 0
>>>>>>>
>>>>>>>    Update Time : Tue Mar 31 23:08:02 2009
>>>>>>>          State : clean
>>>>>>>  Active Devices : 5
>>>>>>> Working Devices : 6
>>>>>>>  Failed Devices : 1
>>>>>>>  Spare Devices : 1
>>>>>>>       Checksum : a4fbb93a - correct
>>>>>>>         Events : 8430
>>>>>>>
>>>>>>>     Chunk Size : 64K
>>>>>>>
>>>>>>>      Number   Major   Minor   RaidDevice State
>>>>>>> this     6       8       16        6      spare   /dev/sdb
>>>>>>>
>>>>>>>   0     0       8        0        0      active sync   /dev/sda
>>>>>>>   1     1       8       64        1      active sync   /dev/sde
>>>>>>>   2     2       8       32        2      active sync   /dev/sdc
>>>>>>>   3     3       8       48        3      active sync   /dev/sdd
>>>>>>>   4     4       0        0        4      faulty removed
>>>>>>>   5     5       8       80        5      active sync
>>>>>>>   6     6       8       16        6      spare   /dev/sdb
>>>>>>> /dev/sdc:
>>>>>>>          Magic : a92b4efc
>>>>>>>        Version : 00.90.00
>>>>>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>>>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>>>>>     Raid Level : raid6
>>>>>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>>>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>>>>>   Raid Devices : 6
>>>>>>>  Total Devices : 4
>>>>>>> Preferred Minor : 0
>>>>>>>
>>>>>>>    Update Time : Sun Jul 12 11:31:47 2009
>>>>>>>          State : clean
>>>>>>>  Active Devices : 4
>>>>>>> Working Devices : 4
>>>>>>>  Failed Devices : 2
>>>>>>>  Spare Devices : 0
>>>>>>>       Checksum : a59452db - correct
>>>>>>>         Events : 580158
>>>>>>>
>>>>>>>     Chunk Size : 64K
>>>>>>>
>>>>>>>      Number   Major   Minor   RaidDevice State
>>>>>>> this     2       8       32        2      active sync   /dev/sdc
>>>>>>>
>>>>>>>   0     0       8        0        0      active sync   /dev/sda
>>>>>>>   1     1       0        0        1      faulty removed
>>>>>>>   2     2       8       32        2      active sync   /dev/sdc
>>>>>>>   3     3       8       48        3      active sync   /dev/sdd
>>>>>>>   4     4       0        0        4      faulty removed
>>>>>>>   5     5       8       96        5      active sync
>>>>>>> /dev/sdd:
>>>>>>>          Magic : a92b4efc
>>>>>>>        Version : 00.90.00
>>>>>>>           UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
>>>>>>>  Creation Time : Wed Mar 25 21:04:08 2009
>>>>>>>     Raid Level : raid6
>>>>>>>  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
>>>>>>>     Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
>>>>>>>   Raid Devices : 6
>>>>>>>  Total Devices : 4
>>>>>>> Preferred Minor : 0
>>>>>>>
>>>>>>>    Update Time : Sun Jul 12 11:31:47 2009
>>>>>>>          State : clean
>>>>>>>  Active Devices : 4
>>>>>>> Working Devices : 4
>>>>>>>  Failed Devices : 2
>>>>>>>  Spare Devices : 0
>>>>>>>       Checksum : a59452ed - correct
>>>>>>>         Events : 580158
>>>>>>>
>>>>>>>     Chunk Size : 64K
>>>>>>>
>>>>>>>      Number   Major   Minor   RaidDevice State
>>>>>>> this     3       8       48        3      active sync   /dev/sdd
>>>>>>>
>>>>>>>   0     0       8        0        0      active sync   /dev/sda
>>>>>>>   1     1       0        0        1      faulty removed
>>>>>>>   2     2       8       32        2      active sync   /dev/sdc
>>>>>>>   3     3       8       48        3      active sync   /dev/sdd
>>>>>>>   4     4       0        0        4      faulty removed
>>>>>>>   5     5       8       96        5      active sync
>>>>>>>
>>>>>>> --
>>>>>>> Carl K
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>       Majed B.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Carl K
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>       Majed B.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Carl K
>>>
>>
>>
>>
>> --
>>       Majed B.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
>
> --
> Carl K
>

-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html