On Thu, 15 Dec 2011 12:36:19 -0800 Keith Keller <kkeller@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > Hello all, > > I have another seminewbie question. I had an issue, likely hardware > related, which forced me to reboot a machine with a RAID6 during a > rebuild after a previous drive failure. Now, after some other hardware > issues, I've been able to successfully assemble the array, but it > seems to be in an odd state: > > # mdadm -D /dev/md0 > /dev/md0: > Version : 1.01 > Creation Time : Thu Sep 29 21:26:35 2011 > Raid Level : raid6 > Array Size : 13671797440 (13038.44 GiB 13999.92 GB) > Used Dev Size : 1953113920 (1862.63 GiB 1999.99 GB) > Raid Devices : 9 > Total Devices : 11 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Thu Dec 15 12:19:41 2011 > State : clean, degraded > Active Devices : 8 > Working Devices : 11 > Failed Devices : 0 > Spare Devices : 3 > > Chunk Size : 64K > > Name : 0 > UUID : 24363b01:90deb9b5:4b51e5df:68b8b6ea > Events : 102730 > > Number Major Minor RaidDevice State > 0 8 17 0 active sync /dev/sdb1 > 6 8 113 1 active sync /dev/sdh1 > 11 8 177 2 spare rebuilding /dev/sdl1 > 3 8 65 3 active sync /dev/sde1 > 4 8 81 4 active sync /dev/sdf1 > 9 8 145 5 active sync /dev/sdj1 > 10 8 97 6 active sync /dev/sdg1 > 7 8 129 7 active sync /dev/sdi1 > 8 8 161 8 active sync /dev/sdk1 > > 12 8 225 - spare /dev/sdo1 > 13 8 49 - spare /dev/sdd1 > > # cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid6 sdd1[13](S) sdb1[0] sdo1[12](S) sdk1[8] sdi1[7] > sdg1[10] sdj1[9] sdf1[4] sde1[3] sdl1[11] sdh1[6] > 13671797440 blocks super 1.1 level 6, 64k chunk, algorithm 2 [9/8] > [UU_UUUUUU] > > unused devices: <none> > > I'm interpreting this as that a member is missing, but for some reason > the rebuild on sdl1 has not restarted. Golly, you must be running an ancient kernel ... I fixed this bug at least 2 days ago... Though admittedly I haven't submitted the fix yet so maybe you have a good excuse :-) If you remove both spares: mdadm /dev/md0 --remove /dev/sdo1 /dev/sdd1 the rebuild should start. You can then add them back again "--add". http://neil.brown.name/git?p=md;a=commitdiff;h=bd8c7cf40d56ca9ce3a6f72886914193674258d1 > What would be the next logical step to take? Send an email to linux-raid asking who broke what.. Oh wait, you did that. NeilBrown > I've found some posts which imply that setting sync_action > to repair will work, but I'm a little wary of doing that without knowing > how risky that is. Or, reading Documentation/md.txt, perhaps I should > set it to "recover"? Or "resync", since it's possible the array was not > shut down cleanly? > > FWIW, I have started the array, activated the LVM volume, and am running > xfs_repair -n (which is not supposed to do any writes), but otherwise > haven't risked modifying the filesystem (e.g., by mounting it). So far > the xfs_repair seems fine, and has not reported any errors. > > Thanks for your help (and patience). > > --keith >
Attachment:
signature.asc
Description: PGP signature