Hello Hans, I would try to re-add the out-of-sync disk (hde10) back to the degraded raid5 array (md4). If hde10 got kicked out again, it's time to replace it with another disk. -- Regards, Mike T. On Fri, 2005-03-04 at 13:42, hpg@xxxxxxxxxxxxx wrote: > Hello everyone, > > I need your help with a strange behavior of a raid5 array. > > My Linux fileserver was frozen for unknown reason. No mouse movement, > no console, no disk activity nothing. > So I had to hit the reset button. > > At boot time 5 raid5 arrays have been active without any faults. > Two other raid5 arrays resynchronized successfully. > Only one had some trouble to recover. > > Because I am using LVM2 on top of all my raid5 arrays and have the root > filesystem in that volume group which is using the raid5 array in > question. > I had to boot from a Fedora Core 3 Rescue CDROM. > > # uname -a > Linux localhost.localdomain 2.6.9-1.667 #1 Tue Nov 2 14:41:31 EST 2004 > i686 unknown > > On boot time I get the following: > > [...] > md: autorun ... > md: considering hdi7 ... > md: adding hdi7 ... > md: adding hdk9 ... > md: adding hdg5 ... > md: adding hde10 ... > md: adding hda11 ... > md: created md4 > md: bind<hda11> > md: bind<hde10> > md: bind<hdg5> > md: bind<hdk9> > md: bind<hdi7> > md: running: <hdi7><hdk9><hdg5><hde10><hda11> > md: kicking non-fresh hde10 from array! > md: unbind<hde10> > md: export_rdev(hde10) > md: md4: raid array is not clean -- starting background reconstruction > raid5: device hdi7 operational as raid disk 4 > raid5: device hdk9 operational as raid disk 3 > raid5: device hdg5 operational as raid disk 2 > raid5: device hda11 operational as raid disk 0 > raid5: cannot start dirty degraded array for md4 > RAID5 conf printout: > --- rd:5 wd:4 fd:1 > disk 0, o:1, dev:hda11 > disk 2, o:1, dev:hdg5 > disk 3, o:1, dev:hdk9 > disk 4, o:1, dev:hdi7 > raid5: failed to run raid set md4 > md: pers->run() failed ... > md :do_md_run() returned -22 > md: md4 stopped. > md: unbind<hdi7> > md: export_rdev(hdi7) > md: unbind<hdk9> > md: export_rdev(hdk9) > md: unbind<hdg5> > md: export_rdev(hdg5) > md: unbind<hda11> > md: export_rdev(hda11) > md: ... autorun DONE. > [...] > > So I tried to reassemble the array: > > # mdadm --assemble /dev/md4 /dev/hda11 /dev/hde10 /dev/hdg5 /dev/hdk9 > /dev/hdi7 > mdadm: /dev/md4 assembled from 4 drives - need all 5 to start it (use > --run to insist) > > # dmesg > [...] > md: md4 stopped. > md: bind<hde10> > md: bind<hdg5> > md: bind<hdk9> > md: bind<hdi7> > md: bind<hda11> > > # cat /proc/mdstat > Personalities : [raid0] [raid1] [raid5] [raid6] > md1 : active raid5 hdi1[4] hdk1[3] hdg1[2] hde7[1] hda3[0] > 81919744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] > > md2 : active raid5 hdi2[4] hdk2[3] hdg2[2] hde8[1] hda5[0] > 81919744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] > > md3 : active raid5 hdi3[4] hdk3[3] hdg3[2] hde9[1] hda6[0] > 81919744 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] > > md4 : inactive hda11[0] hdi7[4] hdk9[3] hdg5[2] hde10[1] > 65246272 blocks > md5 : active raid5 hdl5[3] hdi5[2] hdk5[1] hda7[0] > 61439808 blocks level 5, 64k chunk, algorithm 0 [4/4] [UUUU] > > md6 : active raid5 hdl6[3] hdi6[2] hdk6[1] hda8[0] > 61439808 blocks level 5, 64k chunk, algorithm 0 [4/4] [UUUU] > > md7 : active raid5 hdl7[2] hdk7[1] hda9[0] > 40965504 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU] > > md8 : active raid5 hdl8[2] hdk8[1] hda10[0] > 40965504 blocks level 5, 64k chunk, algorithm 0 [3/3] [UUU] > > unused devices: <none> > > > # mdadm --stop /dev/md4 > # mdadm --assemble --run /dev/md4 /dev/hda11 /dev/hde10 /dev/hdg5 > /dev/hdk9 /dev/hdi7 > mdadm: /dev/md4 has been started with 4 drives (out of 5). > > # cat /proc/mdstat > [...] > md4 : active raid5 hda11[0] hdi7[4] hdk9[3] hdg5[2] > 49126144 blocks level 5, 64k chunk, algorithm 2 [5/4] [U_UUU] > [...] > > # dmesg > [...] > md: bind<hde10> > md: bind<hdg5> > md: bind<hdk9> > md: bind<hdi7> > md: bind<hda11> > md: kicking non-fresh hde10 from array! > md: unbind<hde10> > md: export_rdev(hde10) > raid5: device hda11 operational as raid disk 0 > raid5: device hdi7 operational as raid disk 4 > raid5: device hdk9 operational as raid disk 3 > raid5: device hdg5 operational as raid disk 2 > raid5: allocated 5248kB for md4 > raid5: raid level 5 set md4 active with 4 out of 5 devices, algorithm 2 > RAID5 conf printout: > --- rd:5 wd:4 fd:1 > disk 0, o:1, dev:hda11 > disk 2, o:1, dev:hdg5 > disk 3, o:1, dev:hdk9 > disk 4, o:1, dev:hdi7 > > > So far everything looks ok for me. > But now things become funny: > > # dd if=/dev/md4 of=/dev/null > 0+0 records in > 0+0 records out > > # mdadm --stop /dev/md4 > mdadm: fail to stop array /dev/md4: Device or resource busy > > # dmesg > [...] > md: md4 still in use. > > # dd if=/dev/hda11 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/hde10 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/hdg5 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/hdi7 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/hdk9 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md1 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md2 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md3 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md5 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md6 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md7 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > # dd if=/dev/md8 of=/dev/null count=1000 > 1000+0 records in > 1000+0 records out > > > Now some still missing details: > > # mdadm --detail /dev/md4 > /dev/md4: > Version : 00.90.01 > Creation Time : Sat Jul 24 12:38:25 2004 > Raid Level : raid5 > Device Size : 12281536 (11.71 GiB 12.58 GB) > Raid Devices : 5 > Total Devices : 4 > Preferred Minor : 4 > Persistence : Superblock is persistent > > Update Time : Mon Feb 28 21:10:13 2005 > State : clean, degraded > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > 0 3 11 0 active sync /dev/hda11 > 1 0 0 -1 removed > 2 34 5 2 active sync /dev/hdg5 > 3 57 9 3 active sync /dev/hdk9 > 4 56 7 4 active sync /dev/hdi7 > UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8 > Events : 0.26324 > > # mdadm --examine /dev/hda11 /dev/hde10 /dev/hdg5 /dev/hdi7 /dev/hdk9 > /dev/hda11: > Magic : a92b4efc > Version : 00.90.00 > UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8 > Creation Time : Sat Jul 24 12:38:25 2004 > Raid Level : raid5 > Device Size : 12281536 (11.71 GiB 12.58 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 4 > > Update Time : Mon Feb 28 21:10:13 2005 > State : clean > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 661328a - correct > Events : 0.26324 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 0 3 11 0 active sync /dev/hda11 > 0 0 3 11 0 active sync /dev/hda11 > 1 1 33 10 1 active sync /dev/hde10 > 2 2 34 5 2 active sync /dev/hdg5 > 3 3 57 9 3 active sync /dev/hdk9 > 4 4 56 7 4 active sync /dev/hdi7 > /dev/hde10: > Magic : a92b4efc > Version : 00.90.00 > UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8 > Creation Time : Sat Jul 24 12:38:25 2004 > Raid Level : raid5 > Device Size : 12281536 (11.71 GiB 12.58 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 4 > > Update Time : Mon Feb 28 21:10:13 2005 > State : dirty > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 66132a6 - correct > Events : 0.26322 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 1 33 10 1 active sync /dev/hde10 > 0 0 3 11 0 active sync /dev/hda11 > 1 1 33 10 1 active sync /dev/hde10 > 2 2 34 5 2 active sync /dev/hdg5 > 3 3 57 9 3 active sync /dev/hdk9 > 4 4 56 7 4 active sync /dev/hdi7 > /dev/hdg5: > Magic : a92b4efc > Version : 00.90.00 > UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8 > Creation Time : Sat Jul 24 12:38:25 2004 > Raid Level : raid5 > Device Size : 12281536 (11.71 GiB 12.58 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 4 > > Update Time : Mon Feb 28 21:10:13 2005 > State : dirty > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 66132a6 - correct > Events : 0.26324 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 2 34 5 2 active sync /dev/hdg5 > 0 0 3 11 0 active sync /dev/hda11 > 1 1 33 10 1 active sync /dev/hde10 > 2 2 34 5 2 active sync /dev/hdg5 > 3 3 57 9 3 active sync /dev/hdk9 > 4 4 56 7 4 active sync /dev/hdi7 > /dev/hdi7: > Magic : a92b4efc > Version : 00.90.00 > UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8 > Creation Time : Sat Jul 24 12:38:25 2004 > Raid Level : raid5 > Device Size : 12281536 (11.71 GiB 12.58 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 4 > > Update Time : Mon Feb 28 21:10:13 2005 > State : dirty > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 66132c2 - correct > Events : 0.26324 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 4 56 7 4 active sync /dev/hdi7 > 0 0 3 11 0 active sync /dev/hda11 > 1 1 33 10 1 active sync /dev/hde10 > 2 2 34 5 2 active sync /dev/hdg5 > 3 3 57 9 3 active sync /dev/hdk9 > 4 4 56 7 4 active sync /dev/hdi7 > /dev/hdk9: > Magic : a92b4efc > Version : 00.90.00 > UUID : 1da63142:e1bcc45b:e0287a1a:f9c7c3a8 > Creation Time : Sat Jul 24 12:38:25 2004 > Raid Level : raid5 > Device Size : 12281536 (11.71 GiB 12.58 GB) > Raid Devices : 5 > Total Devices : 5 > Preferred Minor : 4 > > Update Time : Mon Feb 28 21:10:13 2005 > State : dirty > Active Devices : 5 > Working Devices : 5 > Failed Devices : 0 > Spare Devices : 0 > Checksum : 66132c3 - correct > Events : 0.26324 > > Layout : left-symmetric > Chunk Size : 64K > > Number Major Minor RaidDevice State > this 3 57 9 3 active sync /dev/hdk9 > 0 0 3 11 0 active sync /dev/hda11 > 1 1 33 10 1 active sync /dev/hde10 > 2 2 34 5 2 active sync /dev/hdg5 > 3 3 57 9 3 active sync /dev/hdk9 > 4 4 56 7 4 active sync /dev/hdi7 > > > I really would appreciate some help. > > Regards, > Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html