Re: On RAID5 read error during syncing - array .A.A

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



You're right! I just changed it to sdd3 sdb3 sdc3 missing and fsck -n
/dev/md0 detected everything said it was clean.

Thanks a lot. I will backup my important files and write back a quick
summary of what we did to fix this situation.

On Tue, Dec 9, 2014 at 4:01 AM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
> On Tue Dec 09, 2014 at 12:35:14AM -0500, Emery Guevremont wrote:
>> >> >> >> On Mon, Dec 8, 2014 at 4:48 AM, Robin Hill <robin@xxxxxxxxxxxxxxx> wrote:
>> >> >> >> > On Sat Dec 06, 2014 at 03:49:10PM -0500, Emery Guevremont wrote:
>> >> >> >> >> > On Sat Dec 06, 2014 at 01:35:50pm -0500, Emery Guevremont wrote:
>> >> >> >> >> >
>> >> >> >> >> >> The long story and what I've done.
>> >> >> >> >> >>
>> >> >> >> >> >> /dev/md0 is assembled with 4 drives
>> >> >> >> >> >> /dev/sda3
>> >> >> >> >> >> /dev/sdb3
>> >> >> >> >> >> /dev/sdc3
>> >> >> >> >> >> /dev/sdd3
>> >> >> >> >> >>
>> >> >> >> >> >> 2 weeks ago, mdadm marked /dev/sda3 as failed. cat /proc/mdstat showed
>> >> >> >> >> >> _UUU. smarctl also confirmed that the drive was dying. So I shutdown
>> >> >> >> >> >> the server and until I received a replacement drive.
>> >> >> >> >> >>
>> >> >> >> >> >> This week, I replaced the dying drive with my new drive. Booted into
>> >> >> >> >> >> single user mode and did this:
>> >> >> >> >> >>
>> >> >> >> >> >> mdadm --manage /dev/md0 --add /dev/sda3  a cat of /proc/mdstat
>> >> >> >> >> >> confirmed the resyncing process. The last time I checked it was up to
>> >> >> >> >> >> 11%. After a few minutes later, I noticed that the syncing stopped. A
>> >> >> >> >> >> read error message on /dev/sdd3 (have a pic of it if interested)
>> >> >> >> >> >> appear on the console. It appears that /dev/sdd3 might be going bad. A
>> >> >> >> >> >> cat /proc/mdstat showed _U_U. Now I panic, and decide to leave
>> >> >> >> >> >> everything as is and to go to bed.
>> >> >> >> >> >>
>> >> >> >> >> >> The next day, I shutdown the server and reboot with a live usb distro
>> >> >> >> >> >> (Ubuntu rescue remix). After booting into the live distro, a cat
>> >> >> >> >> >> /proc/mdstat showed that my /dev/md0 was detected but all drives had
>> >> >> >> >> >> an (S) next to it. i.e. /dev/sda3 (S)... Naturally I don't like the
>> >> >> >> >> >> looks of this.
>> >> >> >> >> >>
>> >> >> >> >> >> I ran ddrescue to copy /dev/sdd onto my new replacement disk
>> >> >> >> >> >> (/dev/sda). Everything, worked, ddrescue got only one read error, but
>> >> >> >> >> >> was eventually able to read the bad sector on a retry. I followed up
>> >> >> >> >> >> by also cloning with ddrescue, sdb and sdc.
>> >> >> >> >> >>
>> >> >> >> >> >> So now I have cloned copies of sdb, sdc and sdd to work with.
>> >> >> >> >> >> Currently running mdadm --assemble --scan, will activate my array, but
>> >> >> >> >> >> all drives are added as spares. Running mdadm --examine on each
>> >> >> >> >> >> drives, shows the same Array UUID number, but the Raid Devices is 0
>> >> >> >> >> >> and raid level is -unknown- for some reason. The rest seems fine and
>> >> >> >> >> >> makes sense. I believe I could re-assemble my array if I could define
>> >> >> >> >> >> the raid level and raid devices.
>> >> >> >> >> >>
>> >> >> >> >> >> I wanted to know if there are a way to restore my superblocks from the
>> >> >> >> >> >> examine command I ran at the beginning? If not, what mdadm create
>> >> >> >> >> >> command should I run? Also please let me know if drive ordering is
>> >> >> >> >> >> important, and how I can determine this with the examine output I'll
>> >> >> >> >> >> got?
>> >> >> >> >> >>
>> >> >> >> >> >> Thank you.
>> >> >> >> >> >>
>> >> >> >> >> You'll see from the examine output, raid level and devices aren't
>> >> >> >> >> defined and notice the role of each drives. The examine output (I
>> >> >> >> >> attached 4 files) that I took right after the read error during the
>> >> >> >> >> synching process seems to show a more accurate superblock. Here's also
>> >> >> >> >> the output of mdadm --detail /dev/md0 that I took when I got the first
>> >> >> >> >> error:
>> >> >> >> >>
>> >> >> >> >> ARRAY /dev/md/0 metadata=1.2 UUID=cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >> name=runts:0
>> >> >> >> >>    spares=1
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> Here's the output of how things currently are:
>> >> >> >> >>
>> >> >> >> >> mdadm --assemble --force /dev/md127 /dev/sdb3 /dev/sdc3 /dev/sdd3
>> >> >> >> >> mdadm: /dev/md127 assembled from 0 drives and 3 spares - not enough to
>> >> >> >> >> start the array.
>> >> >> >> >>
>> >> >> >> >> dmesg
>> >> >> >> >> [27903.423895] md: md127 stopped.
>> >> >> >> >> [27903.434327] md: bind<sdc3>
>> >> >> >> >> [27903.434767] md: bind<sdd3>
>> >> >> >> >> [27903.434963] md: bind<sdb3>
>> >> >> >> >>
>> >> >> >> >> cat /proc/mdstat
>> >> >> >> >> root@ubuntu:~# cat /proc/mdstat
>> >> >> >> >> Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
>> >> >> >> >> [raid1] [raid10]
>> >> >> >> >> md127 : inactive sdb3[4](S) sdd3[0](S) sdc3[5](S)
>> >> >> >> >>       5858387208 blocks super 1.2
>> >> >> >> >>
>> >> >> >> >> mdadm --examine /dev/sd[bcd]3
>> >> >> >> >> /dev/sdb3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0
>> >> >> >> >>   Creation Time : Tue Jul 26 03:27:39 2011
>> >> >> >> >>      Raid Level : -unknown-
>> >> >> >> >>    Raid Devices : 0
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : active
>> >> >> >> >>     Device UUID : b2bf0462:e0722254:0e233a72:aa5df4da
>> >> >> >> >>
>> >> >> >> >>     Update Time : Sat Dec  6 12:46:40 2014
>> >> >> >> >>        Checksum : 5e8cfc9a - correct
>> >> >> >> >>          Events : 1
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>    Device Role : spare
>> >> >> >> >>    Array State :  ('A' == active, '.' == missing)
>> >> >> >> >> /dev/sdc3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0
>> >> >> >> >>   Creation Time : Tue Jul 26 03:27:39 2011
>> >> >> >> >>      Raid Level : -unknown-
>> >> >> >> >>    Raid Devices : 0
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : active
>> >> >> >> >>     Device UUID : 390bd4a2:07a28c01:528ed41e:a9d0fcf0
>> >> >> >> >>
>> >> >> >> >>     Update Time : Sat Dec  6 12:46:40 2014
>> >> >> >> >>        Checksum : f69518c - correct
>> >> >> >> >>          Events : 1
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>    Device Role : spare
>> >> >> >> >>    Array State :  ('A' == active, '.' == missing)
>> >> >> >> >> /dev/sdd3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0
>> >> >> >> >>   Creation Time : Tue Jul 26 03:27:39 2011
>> >> >> >> >>      Raid Level : -unknown-
>> >> >> >> >>    Raid Devices : 0
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : active
>> >> >> >> >>     Device UUID : 92589cc2:9d5ed86c:1467efc2:2e6b7f09
>> >> >> >> >>
>> >> >> >> >>     Update Time : Sat Dec  6 12:46:40 2014
>> >> >> >> >>        Checksum : 571ad2bd - correct
>> >> >> >> >>          Events : 1
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>    Device Role : spare
>> >> >> >> >>    Array State :  ('A' == active, '.' == missing)
>> >> >> >> >>
>> >> >> >> >> and finally kernel and mdadm versions:
>> >> >> >> >>
>> >> >> >> >> uname -a
>> >> >> >> >> Linux ubuntu 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:41:14 UTC
>> >> >> >> >> 2012 i686 i686 i386 GNU/Linux
>> >> >> >> >>
>> >> >> >> >> mdadm -V
>> >> >> >> >> mdadm - v3.2.3 - 23rd December 2011
>> >> >> >> >
>> >> >> >> >> /dev/sda3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0  (local to host runts)
>> >> >> >> >>   Creation Time : Mon Jul 25 23:27:39 2011
>> >> >> >> >>      Raid Level : raid5
>> >> >> >> >>    Raid Devices : 4
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>> >> >> >> >>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : clean
>> >> >> >> >>     Device UUID : b2bf0462:e0722254:0e233a72:aa5df4da
>> >> >> >> >>
>> >> >> >> >>     Update Time : Tue Dec  2 23:15:37 2014
>> >> >> >> >>        Checksum : 5ed5b898 - correct
>> >> >> >> >>          Events : 3925676
>> >> >> >> >>
>> >> >> >> >>          Layout : left-symmetric
>> >> >> >> >>      Chunk Size : 512K
>> >> >> >> >>
>> >> >> >> >>    Device Role : spare
>> >> >> >> >>    Array State : A.A. ('A' == active, '.' == missing)
>> >> >> >> >
>> >> >> >> >> /dev/sdb3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0  (local to host runts)
>> >> >> >> >>   Creation Time : Mon Jul 25 23:27:39 2011
>> >> >> >> >>      Raid Level : raid5
>> >> >> >> >>    Raid Devices : 4
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>> >> >> >> >>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : clean
>> >> >> >> >>     Device UUID : 92589cc2:9d5ed86c:1467efc2:2e6b7f09
>> >> >> >> >>
>> >> >> >> >>     Update Time : Tue Dec  2 23:15:37 2014
>> >> >> >> >>        Checksum : 57638ebb - correct
>> >> >> >> >>          Events : 3925676
>> >> >> >> >>
>> >> >> >> >>          Layout : left-symmetric
>> >> >> >> >>      Chunk Size : 512K
>> >> >> >> >>
>> >> >> >> >>    Device Role : Active device 0
>> >> >> >> >>    Array State : A.A. ('A' == active, '.' == missing)
>> >> >> >> >
>> >> >> >> >> /dev/sdc3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0  (local to host runts)
>> >> >> >> >>   Creation Time : Mon Jul 25 23:27:39 2011
>> >> >> >> >>      Raid Level : raid5
>> >> >> >> >>    Raid Devices : 4
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>> >> >> >> >>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : clean
>> >> >> >> >>     Device UUID : 390bd4a2:07a28c01:528ed41e:a9d0fcf0
>> >> >> >> >>
>> >> >> >> >>     Update Time : Tue Dec  2 23:15:37 2014
>> >> >> >> >>        Checksum : fb20d8a - correct
>> >> >> >> >>          Events : 3925676
>> >> >> >> >>
>> >> >> >> >>          Layout : left-symmetric
>> >> >> >> >>      Chunk Size : 512K
>> >> >> >> >>
>> >> >> >> >>    Device Role : Active device 2
>> >> >> >> >>    Array State : A.A. ('A' == active, '.' == missing)
>> >> >> >> >
>> >> >> >> >> /dev/sdd3:
>> >> >> >> >>           Magic : a92b4efc
>> >> >> >> >>         Version : 1.2
>> >> >> >> >>     Feature Map : 0x0
>> >> >> >> >>      Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
>> >> >> >> >>            Name : runts:0  (local to host runts)
>> >> >> >> >>   Creation Time : Mon Jul 25 23:27:39 2011
>> >> >> >> >>      Raid Level : raid5
>> >> >> >> >>    Raid Devices : 4
>> >> >> >> >>
>> >> >> >> >>  Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>      Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
>> >> >> >> >>   Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
>> >> >> >> >>     Data Offset : 2048 sectors
>> >> >> >> >>    Super Offset : 8 sectors
>> >> >> >> >>           State : clean
>> >> >> >> >>     Device UUID : 4156ab46:bd42c10d:8565d5af:74856641
>> >> >> >> >>
>> >> >> >> >>     Update Time : Tue Dec  2 23:14:03 2014
>> >> >> >> >>        Checksum : a126853f - correct
>> >> >> >> >>          Events : 3925672
>> >> >> >> >>
>> >> >> >> >>          Layout : left-symmetric
>> >> >> >> >>      Chunk Size : 512K
>> >> >> >> >>
>> >> >> >> >>    Device Role : Active device 1
>> >> >> >> >>    Array State : AAAA ('A' == active, '.' == missing)
>> >> >> >> >
>> >> >> >> > At least you have the previous data anyway, which should allow
>> >> >> >> > reconstruction of the array. The device names have changed between your
>> >> >> >> > two reports though, so I'd advise double-checking which is which before
>> >> >> >> > proceeding.
>> >> >> >> >
>> >> >> >> > The reports indicate that the original array order (based on the device
>> >> >> >> > role field) for the four devices was (using device UUIDs as they're
>> >> >> >> > consistent):
>> >> >> >> >     92589cc2:9d5ed86c:1467efc2:2e6b7f09
>> >> >> >> >     4156ab46:bd42c10d:8565d5af:74856641
>> >> >> >> >     390bd4a2:07a28c01:528ed41e:a9d0fcf0
>> >> >> >> >     b2bf0462:e0722254:0e233a72:aa5df4da
>> >> >> >> >
>> >> >> >> > That would give a current device order of sdd3,sda3,sdc3,sdb3 (I don't
>> >> >> >> > have the current data for sda3, but that's the only missing UUID).
>> >> >> >> >
>> I had forgotten that I took a pic of the read error message, which
>> also contained an output of /proc/mdstat, so I was able to determine
>> the ordering and I ran this command:
>>
> What did that indicate, and how did you map it to the device order below?
>
>> root@ubuntu:~# mdadm -v --create --assume-clean --level=5 --chunk=512
>> --size=1952795136 --raid-devices=4 /dev/md0 /dev/sdd3 /dev/sdb3
>> missing /dev/sdc3
>> mdadm: layout defaults to left-symmetric
>> mdadm: layout defaults to left-symmetric
>> mdadm: /dev/sdd3 appears to be part of a raid array:
>>     level=raid5 devices=4 ctime=Tue Dec  9 05:17:53 2014
>> mdadm: layout defaults to left-symmetric
>> mdadm: /dev/sdb3 appears to be part of a raid array:
>>     level=raid5 devices=4 ctime=Tue Dec  9 05:17:53 2014
>> mdadm: layout defaults to left-symmetric
>> mdadm: /dev/sdc3 appears to be part of a raid array:
>>     level=raid5 devices=4 ctime=Tue Dec  9 05:17:53 2014
>> Continue creating array? y
>> mdadm: Defaulting to version 1.2 metadata
>> mdadm: array /dev/md0 started.
>>
>> I did mdadm -E and everything seemed to be consistent with the
>> original output of the examine command. So I ran fsck -n
>>
>> root@ubuntu:~# fsck -n /dev/md0
>> fsck from util-linux 2.20.1
>> e2fsck 1.42 (29-Nov-2011)
>> fsck.ext4: Group descriptors look bad... trying backup blocks...
>> Error writing block 1 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> Error writing block 2 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> Error writing block 3 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> Error writing block 4 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> Error writing block 5 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> Error writing block 6 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>> ...
>> ...
>> Error writing block 343 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> Error writing block 344 (Attempt to write block to filesystem resulted
>> in short write).  Ignore error? no
>>
>> fsck.ext4: Device or resource busy while trying to open /dev/md0
>> Filesystem mounted or opened exclusively by another program?
>>
>>
>> I believe I made some progress. But before I continue, I wanted to
>> know if I was on the right track?
>>
>> I tried to mount /dev/md0 but got this:
>>
>> root@ubuntu:~# mount -t ext4 /dev/md0 /mnt/
>> mount: wrong fs type, bad option, bad superblock on /dev/md0,
>>        missing codepage or helper program, or other error
>>        In some cases useful info is found in syslog - try
>>        dmesg | tail  or so
>>
>> Am I at a point to run fsck to repair the ext4 superblock?
>>
> No, that output would definitely suggest you have the wrong order.
> That looks to be far too many errors for a normal unclean shutdown
> situation.
>
>> I also tried a different ordering to see what fsck -n would give and I got:
>>
>> root@ubuntu:~# fsck -n /dev/md0
>> fsck from util-linux 2.20.1
>> e2fsck 1.42 (29-Nov-2011)
>> fsck.ext4: Filesystem revision too high while trying to open /dev/md0
>> The filesystem revision is apparently too high for this version of e2fsck.
>> (Or the filesystem superblock is corrupt)
>>
>>
>> The superblock could not be read or does not describe a correct ext2
>> filesystem.  If the device is valid and it really contains an ext2
>> filesystem (and not swap or ufs or something else), then the superblock
>> is corrupt, and you might try running e2fsck with an alternate superblock:
>>     e2fsck -b 8193 <device>
>>
>> Which seems to confirm my first attempt at the ordering was good.
>>
> No, it confirms that the first device was correct - the filesystem
> superblock will be entirely within the first chunk, so only the first
> disk needs to be correct for that to be readable.
>
> Have you tried running it in the order I advised (sdd3, sda3, sdc3,
> missing) or in the order of the UUIDs (if the device order has changed)?
>      92589cc2:9d5ed86c:1467efc2:2e6b7f09
>      4156ab46:bd42c10d:8565d5af:74856641
>      390bd4a2:07a28c01:528ed41e:a9d0fcf0
>      b2bf0462:e0722254:0e233a72:aa5df4da
>
> If not, please do so first and see whether the fsck output is any
> better.
>
> Cheers,
>     Robin
> --
>      ___
>     ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
>    / / )      | Little Jim says ....                            |
>   // !!       |      "He fallen in de water !!"                 |
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux