On Thu Aug 11, 2011 at 02:48:29PM +0100, Another Sillyname wrote: > I have a RAID5 array consisting of 4 drives that recently had a problem. > > One of the drives 'removed' itself from the array and when I added it > back it started the background rebuilding I expected, however I then > noticed from smartctl that the drive was showing 'imminent failure' > due to 3300+ reallocated sector errors. > > At this stage I decided I wanted to pull the drive before it finished > the rebuild and replace it. > > However after I stopped the array using:- > > mdadm --stop /dev/md126 > > I was unable to put that drive into fail status > > mdadm --fail /dev/sdj1 > > No Such Device > Well obviously you can't fail a drive from an array that isn't running (not to mention that your fail syntax is wrong). What you should have done (with the array running) is: mdadm /dev/md126 --fail /dev/sdj1 > At this stage I decided to leave the array offline till I had a > replacement drive available to slot in. > > I now have the replacement drive and as I was unable to either fail or > remove the offending drive I decided to do a physical pull of the > drive, reboot the machine to show the drive remove and then a second > reboot with the new blank drive available. > There's no need for all the rebooting. Simply replacing the offending drive with the new one and restarting the array (either by reboot or a controller scan and array re-assemble) would have worked fine. > This seems to have partially worked in that > > mdadm -D /dev/md126 > /dev/md126: > Version : 1.2 > Creation Time : Sat Aug 6 01:24:12 2011 > Raid Level : raid5 > Used Dev Size : 1953512960 (1863.02 GiB 2000.40 GB) > Raid Devices : 4 > Total Devices : 3 > Persistence : Superblock is persistent > > Update Time : Sun Aug 7 05:23:45 2011 > State : active, degraded, Not Started > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Name : MY_NEW_RAID > UUID : herro_this_isnt_needed > Events : 36003 > > Number Major Minor RaidDevice State > 0 8 129 0 active sync /dev/sdi1 > 1 0 0 1 removed > 2 8 161 2 active sync /dev/sdk1 > 3 8 177 3 active sync /dev/sdl1 > > Which is what I expected to see. > Yep, the removed drive is no longer in the array at all. > However I cannot add the replacement drive into the array. > > ~ >:mdadm --add /dev/md126 /dev/sdj1 > mdadm: add new device failed for /dev/sdj1 as 4: Invalid argument > You really need to check dmesg here to see why it's been rejected. > ~ >:mdadm --add --force /dev/md126 /dev/sdj1 > mdadm: set device faulty failed for /dev/sdj1: No such device > I've no idea what it's doing here. Are you sure that's exactly what you typed? If you'd missed a "-" before the force then it may be interpreting it as "-f" instead, which would fail as /dev/sdj1 is not in the array. > ~ >:mdadm --re-add /dev/md126 /dev/sdj1 > mdadm: --re-add for /dev/sdj1 to /dev/md126 is not possible > As the new drive does not contain any array metadata, it can't be re-added here. > and even more confusingly > > ~ >:mdadm -E /dev/sdj1 > /dev/sdj1: > Magic : a92b4efc > Version : 1.2 > Feature Map : 0x0 > Array UUID : not needed > Name : My_NEW_RAID > Creation Time : Sat Aug 6 01:24:12 2011 > Raid Level : raid5 > Raid Devices : 4 > > Avail Dev Size : 3907027053 (1863.02 GiB 2000.40 GB) > Array Size : 11721077760 (5589.05 GiB 6001.19 GB) > Used Dev Size : 3907025920 (1863.02 GiB 2000.40 GB) > Data Offset : 2048 sectors > Super Offset : 8 sectors > State : active > Device UUID : not needed > > Update Time : Sun Aug 7 05:23:45 2011 > Checksum : 6172254 - correct > Events : 0 > > Layout : left-symmetric > Chunk Size : 512K > > Device Role : spare > Array State : AAAA ('A' == active, '.' == missing) > > > Could someone possibly point me in the right direction as to what I'm > doing wrong? > What's the output of "cat /proc/mdstat" at this point? If it doesn't show /dev/sdj1 as being in the array at all, then I'd go with trying to add it again: mdadm /dev/md126 --add /dev/sdj1 If that still fails, check "dmesg", and possibly try running with -vv to get a more verbose error. Cheers, Robin -- ___ ( ' } | Robin Hill <robin@xxxxxxxxxxxxxxx> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" |
Attachment:
pgpXLtqUA7_f4.pgp
Description: PGP signature