Re: On RAID5 read error during syncing - array .A.A

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just to double check, would this be the right command to run?

mdadm --create --assume-clean --level=5 --size=5858385408
--raid-devices=4 /dev/md0 missing /dev/sdb3 /dev/sdc3 /dev/sdd3

Are there any other options I would need to add? Should I specify
--chunk and --size (and if I entered the right size)?

By the way thanks for the help.

On Mon, Dec 8, 2014 at 4:48 AM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
> On Sat Dec 06, 2014 at 03:49:10PM -0500, Emery Guevremont wrote:
>> On Sat, Dec 6, 2014 at 1:56 PM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
>> > On Sat Dec 06, 2014 at 01:35:50pm -0500, Emery Guevremont wrote:
>> >
>> >> The long story and what I've done.
>> >>
>> >> /dev/md0 is assembled with 4 drives
>> >> /dev/sda3
>> >> /dev/sdb3
>> >> /dev/sdc3
>> >> /dev/sdd3
>> >>
>> >> 2 weeks ago, mdadm marked /dev/sda3 as failed. cat /proc/mdstat showed
>> >> _UUU. smarctl also confirmed that the drive was dying. So I shutdown
>> >> the server and until I received a replacement drive.
>> >>
>> >> This week, I replaced the dying drive with my new drive. Booted into
>> >> single user mode and did this:
>> >>
>> >> mdadm --manage /dev/md0 --add /dev/sda3  a cat of /proc/mdstat
>> >> confirmed the resyncing process. The last time I checked it was up to
>> >> 11%. After a few minutes later, I noticed that the syncing stopped. A
>> >> read error message on /dev/sdd3 (have a pic of it if interested)
>> >> appear on the console. It appears that /dev/sdd3 might be going bad. A
>> >> cat /proc/mdstat showed _U_U. Now I panic, and decide to leave
>> >> everything as is and to go to bed.
>> >>
>> >> The next day, I shutdown the server and reboot with a live usb distro
>> >> (Ubuntu rescue remix). After booting into the live distro, a cat
>> >> /proc/mdstat showed that my /dev/md0 was detected but all drives had
>> >> an (S) next to it. i.e. /dev/sda3 (S)... Naturally I don't like the
>> >> looks of this.
>> >>
>> >> I ran ddrescue to copy /dev/sdd onto my new replacement disk
>> >> (/dev/sda). Everything, worked, ddrescue got only one read error, but
>> >> was eventually able to read the bad sector on a retry. I followed up
>> >> by also cloning with ddrescue, sdb and sdc.
>> >>
>> >> So now I have cloned copies of sdb, sdc and sdd to work with.
>> >> Currently running mdadm --assemble --scan, will activate my array, but
>> >> all drives are added as spares. Running mdadm --examine on each
>> >> drives, shows the same Array UUID number, but the Raid Devices is 0
>> >> and raid level is -unknown- for some reason. The rest seems fine and
>> >> makes sense. I believe I could re-assemble my array if I could define
>> >> the raid level and raid devices.
>> >>
>> >> I wanted to know if there are a way to restore my superblocks from the
>> >> examine command I ran at the beginning? If not, what mdadm create
>> >> command should I run? Also please let me know if drive ordering is
>> >> important, and how I can determine this with the examine output I'll
>> >> got?
>> >>
>> >> Thank you.
>> >>
>> > Have you tried --assemble --force? You'll need to make sure the array's
>> > stopped first, but that's the usual way to get the array back up and
>> > running in that sort of situation.
>> >
>> > If that doesn't work, stop the array again and post:
>> >  - the output from mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcd]3
>> >  - any dmesg output corresponding with the above
>> >  - --examine output for all disks
>> >  - kernel and mdadm versions
>> >
>> > Good luck,
>> >     Robin
>
>> You'll see from the examine output, raid level and devices aren't
>> defined and notice the role of each drives. The examine output (I
>> attached 4 files) that I took right after the read error during the
>> synching process seems to show a more accurate superblock. Here's also
>> the output of mdadm --detail /dev/md0 that I took when I got the first
>> error:
>>
>> ARRAY /dev/md/0 metadata=1.2 UUID=cf9db8fa:0c2bb553:46865912:704cceae
>> name=runts:0
>>    spares=1
>>
>>
>> Here's the output of how things currently are:
>>
>> mdadm --assemble --force /dev/md127 /dev/sdb3 /dev/sdc3 /dev/sdd3
>> mdadm: /dev/md127 assembled from 0 drives and 3 spares - not enough to
>> start the array.
>>
>> dmesg
>> [27903.423895] md: md127 stopped.
>> [27903.434327] md: bind<sdc3>
>> [27903.434767] md: bind<sdd3>
>> [27903.434963] md: bind<sdb3>
>>
>> cat /proc/mdstat
>> root@ubuntu:~# cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
>> [raid1] [raid10]
>> md127 : inactive sdb3[4](S) sdd3[0](S) sdc3[5](S)
>>       5858387208 blocks super 1.2
>>
>> mdadm --examine /dev/sd[bcd]3
>> /dev/sdb3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0
>>   Creation Time : Tue Jul 26 03:27:39 2011
>>      Raid Level : -unknown-
>>    Raid Devices : 0
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : active
>>     Device UUID : b2bf0462:e0722254:0e233a72:aa5df4da
>>
>>     Update Time : Sat Dec  6 12:46:40 2014
>>        Checksum : 5e8cfc9a - correct
>>          Events : 1
>>
>>
>>    Device Role : spare
>>    Array State :  ('A' == active, '.' == missing)
>> /dev/sdc3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0
>>   Creation Time : Tue Jul 26 03:27:39 2011
>>      Raid Level : -unknown-
>>    Raid Devices : 0
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : active
>>     Device UUID : 390bd4a2:07a28c01:528ed41e:a9d0fcf0
>>
>>     Update Time : Sat Dec  6 12:46:40 2014
>>        Checksum : f69518c - correct
>>          Events : 1
>>
>>
>>    Device Role : spare
>>    Array State :  ('A' == active, '.' == missing)
>> /dev/sdd3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0
>>   Creation Time : Tue Jul 26 03:27:39 2011
>>      Raid Level : -unknown-
>>    Raid Devices : 0
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : active
>>     Device UUID : 92589cc2:9d5ed86c:1467efc2:2e6b7f09
>>
>>     Update Time : Sat Dec  6 12:46:40 2014
>>        Checksum : 571ad2bd - correct
>>          Events : 1
>>
>>
>>    Device Role : spare
>>    Array State :  ('A' == active, '.' == missing)
>>
>> and finally kernel and mdadm versions:
>>
>> uname -a
>> Linux ubuntu 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:41:14 UTC
>> 2012 i686 i686 i386 GNU/Linux
>>
>> mdadm -V
>> mdadm - v3.2.3 - 23rd December 2011
>
> The missing data looks similar to a bug fixed a couple of years ago
> (http://neil.brown.name/blog/20120615073245), though the kernel versions
> don't match and the missing data is somewhat different - it may be that
> the relevant patches were backported to the vendor kernel you're using.
>
> With that data missing there's no way to assemble though, so a re-create
> is required in this case (it's a last resort, but I don't see any other
> option).
>
>> /dev/sda3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0  (local to host runts)
>>   Creation Time : Mon Jul 25 23:27:39 2011
>>      Raid Level : raid5
>>    Raid Devices : 4
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : b2bf0462:e0722254:0e233a72:aa5df4da
>>
>>     Update Time : Tue Dec  2 23:15:37 2014
>>        Checksum : 5ed5b898 - correct
>>          Events : 3925676
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>    Device Role : spare
>>    Array State : A.A. ('A' == active, '.' == missing)
>
>> /dev/sdb3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0  (local to host runts)
>>   Creation Time : Mon Jul 25 23:27:39 2011
>>      Raid Level : raid5
>>    Raid Devices : 4
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : 92589cc2:9d5ed86c:1467efc2:2e6b7f09
>>
>>     Update Time : Tue Dec  2 23:15:37 2014
>>        Checksum : 57638ebb - correct
>>          Events : 3925676
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 0
>>    Array State : A.A. ('A' == active, '.' == missing)
>
>> /dev/sdc3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0  (local to host runts)
>>   Creation Time : Mon Jul 25 23:27:39 2011
>>      Raid Level : raid5
>>    Raid Devices : 4
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : 390bd4a2:07a28c01:528ed41e:a9d0fcf0
>>
>>     Update Time : Tue Dec  2 23:15:37 2014
>>        Checksum : fb20d8a - correct
>>          Events : 3925676
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 2
>>    Array State : A.A. ('A' == active, '.' == missing)
>
>> /dev/sdd3:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x0
>>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>>            Name : runts:0  (local to host runts)
>>   Creation Time : Mon Jul 25 23:27:39 2011
>>      Raid Level : raid5
>>    Raid Devices : 4
>>
>>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>>     Data Offset : 2048 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : 4156ab46:bd42c10d:8565d5af:74856641
>>
>>     Update Time : Tue Dec  2 23:14:03 2014
>>        Checksum : a126853f - correct
>>          Events : 3925672
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 1
>>    Array State : AAAA ('A' == active, '.' == missing)
>
> At least you have the previous data anyway, which should allow
> reconstruction of the array. The device names have changed between your
> two reports though, so I'd advise double-checking which is which before
> proceeding.
>
> The reports indicate that the original array order (based on the device
> role field) for the four devices was (using device UUIDs as they're
> consistent):
>     92589cc2:9d5ed86c:1467efc2:2e6b7f09
>     4156ab46:bd42c10d:8565d5af:74856641
>     390bd4a2:07a28c01:528ed41e:a9d0fcf0
>     b2bf0462:e0722254:0e233a72:aa5df4da
>
> That would give a current device order of sdd3,sda3,sdc3,sdb3 (I don't
> have the current data for sda3, but that's the only missing UUID).
>
> The create command would therefore be:
>     mdadm -C -l 5 -n 4 -c 512 -e 1.2 -z 1952795136 \
>         /dev/md0 /dev/sdd3 /dev/sda3 /dev/sdc3 missing
>
> mdadm 3.2.3 should use a data offset of 2048, the same as your old
> array, but you may want to double-check that with a test array on a
> couple of loopback devices first. If not, you'll need to grab the
> latest release and add the --data-offset=2048 parameter to the above
> create command.
>
> You should also follow the instructions for using overlay files at
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
> in order to safely test out the above without risking damage to the
> array data.
>
> Once you've run the create, run a "fsck -n" on the filesystem to check
> that the data looks okay. If not, the order or parameters may be
> incorrect - check the --examine output for any differences from the
> original results.
>
> Cheers,
>     Robin
> --
>      ___
>     ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
>    / / )      | Little Jim says ....                            |
>   // !!       |      "He fallen in de water !!"                 |
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux