On dimanche 16 mai 2010, Leslie Rhorer wrote: > > -----Original Message----- > > From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid- > > owner@xxxxxxxxxxxxxxx] On Behalf Of Pierre Vignéras > > Sent: Sunday, May 16, 2010 10:41 AM > > To: linux-raid@xxxxxxxxxxxxxxx > > Subject: mdadm: failed devices become spares! > > > > Hi, > > > > I encountered a critical problem with mdadm that I submitted to the > > Debian mailing list (it's a debian lenny/stable). They asked me to submit > > this to you. So that's what I do. > > > > To prevent duplication of description/information, I give you the URL of > > that > > bug description: > > > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352 > > > > If you prefer the full stuff to be copy/pasted to that mailing list, just > > ask > > for it. > > > > Note: that bug happened again today, on another RAID array. So the good > > news > > is that it is somewhat reproducible! The bad news, is that unless you > > have a > > magic solution, all my data are just lost (half of it was in the backup > > pipe!)... > > > > Thanks for any help, and regards. > > -- > > Pierre Vignéras > > It's not quite clear to me from the link whether your drives are > truly toast, or not. If they are, then you are hosed. Assuming not, then > you need to use > > `mdadm --examine /dev/sdxx` and `mdadm -Dt /dev/mdyy` > > to determine precisely all the parameters and the order of the block > devices in the array. You need the chunk size, the superblock type, which > slot was occupied by each device in the array (this may not be the same as > when the array was created), the size of the array (if it did not fill the > entire partition in every case), the RAID level, etc. Once you are certain > you have all the information to enable you to re-create the array, if need > be, the try to re-assemble the array with > > `mdadm --assemble --force /dev/mdyy` > > If it works, then fsck the file system. (I think I noticed you are > using XFS. If so, do not use XFS_Check. Instead, use XFS_Repair with the > -n option.) After you have a clean file system, issue the command > > `echo repair > /sys/block/mdyy/md/sync_action` > > to re-sync the array. If the array does not assemble, then you will > need to stop it and re-create it using the options you obtained from your > research above and adding the --assume-clean switch to prevent a resync if > something is wrong. If the fsck won't work after re-creating the array, > then you probably got one or more of the parameters incorrect. Thanks for your help. Here is what I did: # cat /proc/mdstat Personalities : [raid1] [raid10] [raid6] [raid5] [raid4] [...] md2 : inactive sdc1[2](S) sdd1[5](S) sdf1[4](S) sde1[3](S) 1250274304 blocks [...] # mdadm --examine /dev/sdc1 /dev/sdd1 /dev/sdf1 /dev/sde1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7939 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 33 2 active sync /dev/sdc1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7949 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 5 8 49 5 spare /dev/sdd1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sdf1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf7967 - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 4 8 81 4 spare /dev/sdf1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 /dev/sde1: Magic : a92b4efc Version : 00.90.00 UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos) Creation Time : Thu Aug 6 01:59:44 2009 Raid Level : raid10 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 2 Update Time : Tue Apr 13 19:22:21 2010 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 4 Failed Devices : 0 Spare Devices : 2 Checksum : 5baf795b - correct Events : 90612 Layout : near=2, far=1 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 65 3 active sync /dev/sde1 0 0 0 0 0 removed 1 1 0 0 1 faulty removed 2 2 8 33 2 active sync /dev/sdc1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 spare /dev/sdf1 5 5 8 49 5 spare /dev/sdd1 # mdadm -Dt /dev/md2 mdadm: md device /dev/md2 does not appear to be active. phobos:~# # mdadm --assemble --force /dev/md2 mdadm: /dev/md2 assembled from 2 drives and 2 spares - not enough to start the array. # What I don't get, is how those devices /dev/sdf1 and /dev/sdd1 have been marked as spares after being marked as faulty! I never asked for it. As shown at the previous Debian Bug link (repeated here for your convenience): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=578352 <bug description extract> ... Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2, component device /dev/sdf1 Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device /dev/md2, component device /dev/sdf1 Is that last line normal? It seems to me that this failed drive has been made a spare! (I really hope that I misunderstood something). Is it possible that the USB system (with its "plug'n play" sort-of feature) had made the behaviour of mdadm so strange? </bug> And the next question is: how to activate those 2 spare drives? I was expecting mdadm to use them automagically. Did I miss something, or is there something really strange happening there? Thanks again. -- Pierre Vignéras -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html