On 20 Nov 2012 17:11:45 -0500 "George Spelvin" <linux@xxxxxxxxxxx> wrote: > I have a RAID10 array with 4 active + 1 spare. > Kernel is 3.6.5, x86-64 but running 32-bit unserland. > > After a recent failure on sdd2, the spare sdc2 was > activated and things looked something like (manual edit, > may not be perfectly faithful): > > md5 : active raid10 sdd2[4](F) sdb2[1] sde2[2] sdc2[3] sda2[0] > 725591552 blocks 256K chunks 2 near-copies [4/4] [UUUU] > bitmap: 50/173 pages [200KB], 2048KB chunk > > smartctl -A showed 1 pending sector, but badblocks didn't > find it, so I decided to play with moving things back: > > # badblocks -s -v /dev/sdd2 > # mdadm /dev/md5 -r /dev/sdd2 -a /dev/sdd2 > # echo want_replacement > /sys/block/md5/md/dev-sdc2/state > > This ran for a while, but now it has stopped, with the following > configuration: > > md5 : active raid10 sdd2[3](R) sdb2[1] sde2[2] sdc2[4](F) sda2[0] > 725591552 blocks 256K chunks 2 near-copies [4/4] [UUU_] > bitmap: 50/173 pages [200KB], 2048KB chunk > > # [530]# cat /sys/block/md5/md/dev-sd?2/state > in_sync > in_sync > faulty,want_replacement > in_sync,replacement > in_sync > > I'm not quite sure how to interpret this state, and why it is showing > "4/4" good drives but [UUU_]. "4/4" means the array is not degraded. [UUU_] means that the drive in slot 3 is faulty. The way this can happen without the array being degraded is that the replacement is fully in-sync. What has happened is the replacement finished perfectly and the want-replace device was marked as faulty, but when md tried to remove that faulty device it found that it was still active. Some request that has previously been sent hadn't completed yet. So it couldn't remove it immediately. Unfortunately it doesn't retry in any great hurry .. or possibly at all. I'll have to look in to that and figure out the best fix. ... > It appears to have completed: > Nov 20 18:40:01 science kernel: md: md5: recovery done. > Nov 20 18:40:01 science kernel: RAID10 conf printout: > Nov 20 18:40:01 science kernel: --- wd:4 rd:4 > Nov 20 18:40:01 science kernel: disk 0, wo:0, o:1, dev:sda2 > Nov 20 18:40:01 science kernel: disk 1, wo:0, o:1, dev:sdb2 > Nov 20 18:40:01 science kernel: disk 2, wo:0, o:1, dev:sde2 > Nov 20 18:40:01 science kernel: disk 3, wo:1, o:0, dev:sdc2 > > But as mentioned, the RAID state is a bit odd. sdc2 is still in the > array and sdd2 is not. Yes, it completed. The "conf printout" doesn't mention replacement devices yet. I guess it should.. NeilBrown
Attachment:
signature.asc
Description: PGP signature