Re: raid6 recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 14 Jan 2011 17:16:26 +0100 Björn Englund <be@xxxxxxxxxxx> wrote:

> Hi.
> 
> After a loss of communication with a drive in a 10 disk raid6 the disk
> was dropped out of the raid.
> 
> I added it again with
> mdadm /dev/md16 --add /dev/sdbq1
> 
> The array resynced and I used the xfs filesystem on top of the raid.
> 
> After a while I started noticing filesystem errors.
> 
> I did
> echo check > /sys/block/md16/md/sync_action
> 
> I got a lot of errors in /sys/block/md16/md/mismatch_cnt
> 
> I failed and removed the disk I added before from the array.
> 
> Did a check again (on the 9/10 array)
> echo check > /sys/block/md16/md/sync_action
> 
> No errors  /sys/block/md16/md/mismatch_cnt
> 
> Wiped the superblock from /dev/sdbq1 and added it again to the array.
> Let it finish resyncing.
> Did a check and once again a lot of errors.

That is obviously very bad.  After the recovery it may well report a large
number in mismatch_cnt, but if you then do a 'check' the number should go to
zero and stay there.

Did you interrupt the recovery at all, or did it run to completion without
any interference?   What kernel version are you using?

> 
> The drive now has slot 10 instead of slot 3 which it had before the
> first error.

This is normal.  When you wipes the superblock, md though it was a new device
and gave it a new number in the array.  It still filled the same role though.


> 
> Examining each device (see below) shows 11 slots and one failed?
> (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3) ?

These numbers are confusing, but they are correct and suggest the array is
whole and working.
Newer version of mdadm are less confusing.

I'm afraid I cannot suggest what the root problem is.  It seems like
something seriously wrong with IO to the device, but if that is the case you
would expect other errors...

NeilBrown


> 
> 
> Any idea what is going on?
> 
> mdadm --version
> mdadm - v2.6.9 - 10th March 2009
> 
> Centos 5.5
> 
> 
> mdadm -D /dev/md16
> /dev/md16:
>         Version : 1.01
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>      Array Size : 7809792000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 976224000 (931.00 GiB 999.65 GB)
>    Raid Devices : 10
>   Total Devices : 10
> Preferred Minor : 16
>     Persistence : Superblock is persistent
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>           State : clean
>  Active Devices : 10
> Working Devices : 10
>  Failed Devices : 0
>   Spare Devices : 0
> 
>      Chunk Size : 256K
> 
>            Name : 16
>            UUID : fcd585d0:f2918552:7090d8da:532927c8
>          Events : 90
> 
>     Number   Major   Minor   RaidDevice State
>        0       8      145        0      active sync   /dev/sdj1
>        1      65        1        1      active sync   /dev/sdq1
>        2      65       17        2      active sync   /dev/sdr1
>       10      68       65        3      active sync   /dev/sdbq1
>        4      65       49        4      active sync   /dev/sdt1
>        5      65       65        5      active sync   /dev/sdu1
>        6      65      113        6      active sync   /dev/sdx1
>        7      65      129        7      active sync   /dev/sdy1
>        8      65       33        8      active sync   /dev/sds1
>        9      65      145        9      active sync   /dev/sdz1
> 
> 
> 
> mdadm -E /dev/sdj1
> /dev/sdj1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : 5db9c8f7:ce5b375e:757c53d0:04e89a06
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 1f17a675 - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 0 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : Uuuuuuuuuu 1 failed
> 
> 
> 
> mdadm -E /dev/sdq1
> /dev/sdq1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : fb113255:fda391a6:7368a42b:1d6d4655
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 6ed7b859 - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 1 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : uUuuuuuuuu 1 failed
> 
> 
>  mdadm -E /dev/sdr1
> /dev/sdr1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : afcb4dd8:2aa58944:40a32ed9:eb6178af
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 97a7a2d7 - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 2 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : uuUuuuuuuu 1 failed
> 
> 
> mdadm -E /dev/sdbq1
> /dev/sdbq1:
>           Magic : a92b4efc
>         Version : 1.1
>     Feature Map : 0x0
>      Array UUID : fcd585d0:f2918552:7090d8da:532927c8
>            Name : 16
>   Creation Time : Thu Nov 25 09:15:54 2010
>      Raid Level : raid6
>    Raid Devices : 10
> 
>  Avail Dev Size : 1952448248 (931.00 GiB 999.65 GB)
>      Array Size : 15619584000 (7448.00 GiB 7997.23 GB)
>   Used Dev Size : 1952448000 (931.00 GiB 999.65 GB)
>     Data Offset : 264 sectors
>    Super Offset : 0 sectors
>           State : clean
>     Device UUID : 93c6ae7c:d8161356:7ada1043:d0c5a924
> 
>     Update Time : Fri Jan 14 16:22:10 2011
>        Checksum : 2ca5aa8f - correct
>          Events : 90
> 
>      Chunk Size : 256K
> 
>     Array Slot : 10 (0, 1, 2, failed, 4, 5, 6, 7, 8, 9, 3)
>    Array State : uuuUuuuuuu 1 failed
> 
> 
> and so on for the rest of the drives.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux