Re: strange behavior of a raid5 array after system crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Hans,

I would try to re-add the out-of-sync disk (hde10) back to the degraded
raid5 array (md4).  If hde10 got kicked out again, it's time to replace
it with another disk.

--
Regards,
Mike T.

On Fri, 2005-03-04 at 13:42, hpg@xxxxxxxxxxxxx wrote:
> Hello everyone,
> 
> I need your help with a strange behavior of a raid5 array.
> 
> My Linux fileserver was frozen for unknown reason. No mouse movement, 
> no console, no disk activity nothing.
> So I had to hit the reset button.
> 
> At boot time 5 raid5 arrays have been active without any faults.
> Two other raid5 arrays resynchronized successfully.
> Only one had some trouble to recover.
> 
> Because I am using LVM2 on top of all my raid5 arrays and have the root 
> filesystem in that volume group which is using the raid5 array in
> question.
> I had to boot from a Fedora Core 3 Rescue CDROM.
> 
> # uname -a 
> Linux localhost.localdomain 2.6.9-1.667 #1 Tue Nov 2 14:41:31 EST 2004
> i686 unknown
> 
> On boot time I get the following:
> 
> [...]
> md: autorun ...
> md: considering hdi7 ...
> md:  adding hdi7 ...
> md:  adding hdk9 ...
> md:  adding hdg5 ...
> md:  adding hde10 ...
> md:  adding hda11 ...
> md: created md4
> md: bind<hda11>
> md: bind<hde10>
> md: bind<hdg5>
> md: bind<hdk9>
> md: bind<hdi7>
> md: running: <hdi7><hdk9><hdg5><hde10><hda11>
> md: kicking non-fresh hde10 from array!
> md: unbind<hde10>
> md: export_rdev(hde10)
> md: md4: raid array is not clean -- starting background reconstruction
> raid5: device hdi7 operational as raid disk 4
> raid5: device hdk9 operational as raid disk 3
> raid5: device hdg5 operational as raid disk 2
> raid5: device hda11 operational as raid disk 0
> raid5: cannot start dirty degraded array for md4
> RAID5 conf printout:
>  --- rd:5 wd:4 fd:1
>  disk 0, o:1, dev:hda11
>  disk 2, o:1, dev:hdg5
>  disk 3, o:1, dev:hdk9
>  disk 4, o:1, dev:hdi7
> raid5: failed to run raid set md4
> md: pers->run() failed ...
> md :do_md_run() returned -22
> md: md4 stopped.
> md: unbind<hdi7>
> md: export_rdev(hdi7)
> md: unbind<hdk9>
> md: export_rdev(hdk9)
> md: unbind<hdg5>
> md: export_rdev(hdg5)
> md: unbind<hda11>
> md: export_rdev(hda11)
> md: ... autorun DONE.
> [...]
> 
> So I tried to reassemble the array:
> 
> # mdadm --assemble /dev/md4 /dev/hda11 /dev/hde10 /dev/hdg5 /dev/hdk9
> /dev/hdi7
> mdadm: /dev/md4 assembled from 4 drives - need all 5 to start it (use
> --run to insist)
> 
> # dmesg
> [...]
> md: md4 stopped.
> md: bind<hde10>
> md: bind<hdg5>
> md: bind<hdk9>
> md: bind<hdi7>
> md: bind<hda11>
> 
> # cat /proc/mdstat
> Personalities : [raid0] [raid1] [raid5] [raid6]
> md1 : active raid5 hdi1[4] hdk1[3] hdg1[2] hde7[1] hda3[0]
>       81919744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> 
> md2 : active raid5 hdi2[4] hdk2[3] hdg2[2] hde8[1] hda5[0]
>       81919744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> 
> md3 : active raid5 hdi3[4] hdk3[3] hdg3[2] hde9[1] hda6[0]
>       81919744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
> 
> md4 : inactive hda11[0] hdi7[4] hdk9[3] hdg5[2] hde10[1]
>       65246272 blocks
> md5 : active raid5 hdl5[3] hdi5[2] hdk5[1] hda7[0]
>       61439808 blocks level 5, 64k chunk, algorithm 0 [4/4] [UUUU]
> 
> md6 : active raid5 hdl6[3] hdi6[2] hdk6[1] hda8[0]
>       61439808 blocks level 5, 64k chunk, algorithm 0 [4/4] [UUUU]
> 
> md7 : active raid5 hdl7[2] hdk7[1] hda9[0]
>       40965504 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU]
> 
> md8 : active raid5 hdl8[2] hdk8[1] hda10[0]
>       40965504 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU]
> 
> unused devices: <none>
> 
> 
> # mdadm --stop /dev/md4
> # mdadm --assemble --run /dev/md4 /dev/hda11 /dev/hde10 /dev/hdg5
> /dev/hdk9 /dev/hdi7
> mdadm: /dev/md4 has been started with 4 drives (out of 5).
> 
> # cat /proc/mdstat
> [...]
> md4 : active raid5 hda11[0] hdi7[4] hdk9[3] hdg5[2]
>       49126144 blocks level 5, 64k chunk, algorithm 2 [5/4] [U_UUU]
> [...]
> 
> # dmesg
> [...]
> md: bind<hde10>
> md: bind<hdg5>
> md: bind<hdk9>
> md: bind<hdi7>
> md: bind<hda11>
> md: kicking non-fresh hde10 from array!
> md: unbind<hde10>
> md: export_rdev(hde10)
> raid5: device hda11 operational as raid disk 0
> raid5: device hdi7 operational as raid disk 4
> raid5: device hdk9 operational as raid disk 3
> raid5: device hdg5 operational as raid disk 2
> raid5: allocated 5248kB for md4
> raid5: raid level 5 set md4 active with 4 out of 5 devices, algorithm 2
> RAID5 conf printout:
>  --- rd:5 wd:4 fd:1
>  disk 0, o:1, dev:hda11
>  disk 2, o:1, dev:hdg5
>  disk 3, o:1, dev:hdk9
>  disk 4, o:1, dev:hdi7
> 
> 
> So far everything looks ok for me.
> But now things become funny:
> 
> # dd if=/dev/md4 of=/dev/null
> 0+0 records in
> 0+0 records out
> 
> # mdadm --stop /dev/md4
> mdadm: fail to stop array /dev/md4: Device or resource busy
> 
> # dmesg
> [...]
> md: md4 still in use.
> 
> # dd if=/dev/hda11 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/hde10 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/hdg5 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/hdi7 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/hdk9 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md1 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md2 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md3 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md5 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md6 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md7 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> # dd if=/dev/md8 of=/dev/null count=1000
> 1000+0 records in
> 1000+0 records out
> 
> 
> Now some still missing details:
> 
> # mdadm --detail /dev/md4
> /dev/md4:
>         Version : 00.90.01
>   Creation Time : Sat Jul 24 12:38:25 2004
>      Raid Level : raid5
>     Device Size : 12281536 (11.71 GiB 12.58 GB)
>    Raid Devices : 5
>   Total Devices : 4
> Preferred Minor : 4
>     Persistence : Superblock is persistent
> 
>     Update Time : Mon Feb 28 21:10:13 2005
>           State : clean, degraded
>  Active Devices : 4
> Working Devices : 4
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>     Number   Major   Minor   RaidDevice State
>        0       3       11        0      active sync   /dev/hda11
>        1       0        0       -1      removed
>        2      34        5        2      active sync   /dev/hdg5
>        3      57        9        3      active sync   /dev/hdk9
>        4      56        7        4      active sync   /dev/hdi7
>            UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8
>          Events : 0.26324
> 
> # mdadm --examine /dev/hda11 /dev/hde10 /dev/hdg5 /dev/hdi7 /dev/hdk9
> /dev/hda11:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8
>   Creation Time : Sat Jul 24 12:38:25 2004
>      Raid Level : raid5
>     Device Size : 12281536 (11.71 GiB 12.58 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 4
> 
>     Update Time : Mon Feb 28 21:10:13 2005
>           State : clean
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 661328a - correct
>          Events : 0.26324
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     0       3       11        0      active sync   /dev/hda11
>    0     0       3       11        0      active sync   /dev/hda11
>    1     1      33       10        1      active sync   /dev/hde10
>    2     2      34        5        2      active sync   /dev/hdg5
>    3     3      57        9        3      active sync   /dev/hdk9
>    4     4      56        7        4      active sync   /dev/hdi7
> /dev/hde10:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8
>   Creation Time : Sat Jul 24 12:38:25 2004
>      Raid Level : raid5
>     Device Size : 12281536 (11.71 GiB 12.58 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 4
> 
>     Update Time : Mon Feb 28 21:10:13 2005
>           State : dirty
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 66132a6 - correct
>          Events : 0.26322
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     1      33       10        1      active sync   /dev/hde10
>    0     0       3       11        0      active sync   /dev/hda11
>    1     1      33       10        1      active sync   /dev/hde10
>    2     2      34        5        2      active sync   /dev/hdg5
>    3     3      57        9        3      active sync   /dev/hdk9
>    4     4      56        7        4      active sync   /dev/hdi7
> /dev/hdg5:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8
>   Creation Time : Sat Jul 24 12:38:25 2004
>      Raid Level : raid5
>     Device Size : 12281536 (11.71 GiB 12.58 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 4
> 
>     Update Time : Mon Feb 28 21:10:13 2005
>           State : dirty
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 66132a6 - correct
>          Events : 0.26324
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     2      34        5        2      active sync   /dev/hdg5
>    0     0       3       11        0      active sync   /dev/hda11
>    1     1      33       10        1      active sync   /dev/hde10
>    2     2      34        5        2      active sync   /dev/hdg5
>    3     3      57        9        3      active sync   /dev/hdk9
>    4     4      56        7        4      active sync   /dev/hdi7
> /dev/hdi7:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8
>   Creation Time : Sat Jul 24 12:38:25 2004
>      Raid Level : raid5
>     Device Size : 12281536 (11.71 GiB 12.58 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 4
> 
>     Update Time : Mon Feb 28 21:10:13 2005
>           State : dirty
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 66132c2 - correct
>          Events : 0.26324
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     4      56        7        4      active sync   /dev/hdi7
>    0     0       3       11        0      active sync   /dev/hda11
>    1     1      33       10        1      active sync   /dev/hde10
>    2     2      34        5        2      active sync   /dev/hdg5
>    3     3      57        9        3      active sync   /dev/hdk9
>    4     4      56        7        4      active sync   /dev/hdi7
> /dev/hdk9:
>           Magic : a92b4efc
>         Version : 00.90.00
>            UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8
>   Creation Time : Sat Jul 24 12:38:25 2004
>      Raid Level : raid5
>     Device Size : 12281536 (11.71 GiB 12.58 GB)
>    Raid Devices : 5
>   Total Devices : 5
> Preferred Minor : 4
> 
>     Update Time : Mon Feb 28 21:10:13 2005
>           State : dirty
>  Active Devices : 5
> Working Devices : 5
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : 66132c3 - correct
>          Events : 0.26324
> 
>          Layout : left-symmetric
>      Chunk Size : 64K
> 
>       Number   Major   Minor   RaidDevice State
> this     3      57        9        3      active sync   /dev/hdk9
>    0     0       3       11        0      active sync   /dev/hda11
>    1     1      33       10        1      active sync   /dev/hde10
>    2     2      34        5        2      active sync   /dev/hdg5
>    3     3      57        9        3      active sync   /dev/hdk9
>    4     4      56        7        4      active sync   /dev/hdi7
> 
> 
> I really would appreciate some help.
> 
> Regards,
> Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux