On Tue, 1 Mar 2011 21:05:33 -0800 (PST) jahammonds prost <gmitch64@xxxxxxxxx> wrote: > I've just had a 3rd drive fail on one of my RAID 6 arrays, and I'm looking for > some advice on how to get it back enough that I can recover the data, and then > replacing the other failed drives. > > > mdadm -V > mdadm - v3.0.3 - 22nd October 2009 > > > Not the most up to date release, but it seems to be the latest one available on > FC12 > > > > The /etc/mdadm.conf file is > > ARRAY /dev/md0 uuid=1470c671:4236b155:67287625:899db153 > > > Which explains why I didn't get emailed about the drive failures. This isn't my > standard file, and I don't know how it was changed, but that's another issue for > another day. > > > > mdadm --detail /dev/md0 > /dev/md0: > Version : 1.2 > Creation Time : Sat Jun 5 10:38:11 2010 > Raid Level : raid6 > Used Dev Size : 488383488 (465.76 GiB 500.10 GB) > Raid Devices : 15 > Total Devices : 12 > Persistence : Superblock is persistent > Update Time : Tue Mar 1 22:17:41 2011 > State : active, degraded, Not Started > Active Devices : 12 > Working Devices : 12 > Failed Devices : 0 > Spare Devices : 0 > Chunk Size : 512K > Name : file00bert.woodlea.org.uk:0 (local to host > file00bert.woodlea.org.uk) > UUID : 1470c671:4236b155:67287625:899db153 > Events : 254890 > Number Major Minor RaidDevice State > 0 8 113 0 active sync /dev/sdh1 > 1 8 17 1 active sync /dev/sdb1 > 2 8 177 2 active sync /dev/sdl1 > 3 0 0 3 removed > 4 8 33 4 active sync /dev/sdc1 > 5 8 193 5 active sync /dev/sdm1 > 6 0 0 6 removed > 7 8 49 7 active sync /dev/sdd1 > 8 8 209 8 active sync /dev/sdn1 > 9 8 161 9 active sync /dev/sdk1 > 10 0 0 10 removed > 11 8 225 11 active sync /dev/sdo1 > 12 8 81 12 active sync /dev/sdf1 > 13 8 241 13 active sync /dev/sdp1 > 14 8 1 14 active sync /dev/sda1 > > > > The output from the failed drives are as follows. > > > mdadm --examine /dev/sde1 > /dev/sde1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 1470c671:4236b155:67287625:899db153 > Name : file00bert.woodlea.org.uk:0 (local to host > file00bert.woodlea.org.uk) > Creation Time : Sat Jun 5 10:38:11 2010 > Raid Level : raid6 > Raid Devices : 15 > Avail Dev Size : 976767730 (465.76 GiB 500.11 GB) > Array Size : 12697970688 (6054.86 GiB 6501.36 GB) > Used Dev Size : 976766976 (465.76 GiB 500.10 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : clean > Device UUID : 3e284f2e:d939fb97:0b74eb88:326e879c > Internal Bitmap : 2 sectors from superblock > Update Time : Tue Mar 1 21:53:31 2011 > Checksum : 768f0f34 - correct > Events : 254591 > Chunk Size : 512K > Device Role : Active device 10 > Array State : AAA.AA.AAAAAAAA ('A' == active, '.' == missing) > > > The above is the drive that failed tonight, and the one I would like to re add > back into the array. There have been no writes to the filesystem on the array in > the last couple of days (other than what ext4 would do on it's own). > > > mdadm --examine /dev/sdi1 > /dev/sdi1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 1470c671:4236b155:67287625:899db153 > Name : file00bert.woodlea.org.uk:0 (local to host > file00bert.woodlea.org.uk) > Creation Time : Sat Jun 5 10:38:11 2010 > Raid Level : raid6 > Raid Devices : 15 > Avail Dev Size : 976767730 (465.76 GiB 500.11 GB) > Array Size : 12697970688 (6054.86 GiB 6501.36 GB) > Used Dev Size : 976766976 (465.76 GiB 500.10 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 8e668e39:06d8281b:b79aa3ab:a1d55fb5 > Internal Bitmap : 2 sectors from superblock > Update Time : Thu Feb 10 18:20:54 2011 > Checksum : 4078396b - correct > Events : 254075 > Chunk Size : 512K > Device Role : Active device 3 > Array State : AAAAAA.AAAAAAAA ('A' == active, '.' == missing) > > > mdadm --examine /dev/sdj1 > /dev/sdj1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x1 > Array UUID : 1470c671:4236b155:67287625:899db153 > Name : file00bert.woodlea.org.uk:0 (local to host > file00bert.woodlea.org.uk) > Creation Time : Sat Jun 5 10:38:11 2010 > Raid Level : raid6 > Raid Devices : 15 > Avail Dev Size : 976767730 (465.76 GiB 500.11 GB) > Array Size : 12697970688 (6054.86 GiB 6501.36 GB) > Used Dev Size : 976766976 (465.76 GiB 500.10 GB) > Data Offset : 272 sectors > Super Offset : 8 sectors > State : active > Device UUID : 37d422cc:8436960a:c3c4d11c:81a8e4fa > Internal Bitmap : 2 sectors from superblock > Update Time : Thu Oct 21 23:45:06 2010 > Checksum : 78950bb5 - correct > Events : 21435 > Chunk Size : 512K > Device Role : Active device 6 > Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing) > > > Looks like sdj1 failed waaay back in Oct last year (sigh). As I said, I am not > to bothered about adding these last 2 drives back into the array, since they > failed so long ago. I have a couple of spare drives sitting here, and I will > replace these 2 drives with them (once I have completed a badblocks on them). > Looking at the output of dmesg, there are no other errors showing for the 3 > drives, other than them being kicked out of the array for being non fresh. > > I guess I have a couple of questions. > > What's the correct process for adding the failed /dev/sde1 back into the array > so I can start it. I don't want to rush into this and make things worse. If you think that the drives really are working and that it was a cabling problem then stop the array (if it isn't stopped already) and assemble with --force: mdadm --assemble --force /dev/md0 /dev....list of devices Then find the devices that it chose not to include and add them individually mdadm /dev/md0 --add /dev/something However if any device has a bad block that cannot be read, then this won't work. In that case you need to get a new device, partition it to have a partition EXACTLY the same size, use dd_rescue to copy all the good data from the bad drive to the new drive, remove the bad drive from the system, and use the "--assemble --force" command using the new drive, not the old drive. > > What's the correct process for replacing the 2 other drives? > I am presuming that I need to --fail, then --remove then --add the drives (one > at a time?), but I want to make sure. There are already failed and removed so there is no point in trying to do that again Good luck. NeilBrown > > > Thanks for your help. > > > Graham. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html