On Sun, 13 May 2012 20:21:48 +0200 Michał Sawicz <michal@xxxxxxxxxx> wrote: > Hey, > > I've a weird issue with a RAID6 setup, /proc/mdstat says: > > > md126 : active raid6 sda1[3] sdh1[6] sdg1[0](F) sdf1[5] sdi1[1] sdc[8] sdb[7] > > 9767559680 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/6] [_UUUUUU] > > So sdg1 is (F)ailed, yet `mdadm --remove` yields: > > > md: cannot remove active disk sdg1 from md126 ... There is a period of time between when a device fails and when the raid456 module finally lets go of it so it can be removed. You seem to be in this period of time. Normally it is very short. It needs to wait for any requests that have already been sent to the device to complete (probably with failure) and very shortly after that it should be released. So this is normally much less than one second but could be several seconds is some excessive retry is happening. But I'm guessing you have waited more than a few seconds. I vaguely recall a bug in the not too distant past whereby RAID456 wouldn't let go of a device quite as soon as it should. Unfortunately I don't remember the details. You might be able to trigger it to release the drive by adding a spare - if you have one - or maybe by just echo sync > /sys/block/md126/md/sync_action it won't actually do a sync, but it might check things enough to make progress. What kernel are you using? NeilBrown > > in dmesg... > > `mdadm --examine` shows: > > > /dev/sdg1: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x0 > > Array UUID : ff9e032c:446ed0bd:fc9473f3:f8e090ed > > Name : media:store (local to host media) > > Creation Time : Tue Sep 13 21:36:43 2011 > > Raid Level : raid6 > > Raid Devices : 7 > > > > Avail Dev Size : 3907024896 (1863.01 GiB 2000.40 GB) > > Array Size : 19535119360 (9315.07 GiB 10001.98 GB) > > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : 4bcee8e2:709419b6:fbeb3a8e:5c9bb68a > > > > Update Time : Sat May 12 21:57:27 2012 > > Checksum : ffb03189 - correct > > Events : 304564 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 0 > > Array State : AAAAAAA ('A' == active, '.' == missing) > > So that superblock thinks it's active, but that's normal, right? It > wasn't updated due to fail? Others correctly show: > > > dev/sdc: > > Magic : a92b4efc > > Version : 1.2 > > Feature Map : 0x0 > > Array UUID : ff9e032c:446ed0bd:fc9473f3:f8e090ed > > Name : media:store (local to host media) > > Creation Time : Tue Sep 13 21:36:43 2011 > > Raid Level : raid6 > > Raid Devices : 7 > > > > Avail Dev Size : 3907027120 (1863.02 GiB 2000.40 GB) > > Array Size : 19535119360 (9315.07 GiB 10001.98 GB) > > Used Dev Size : 3907023872 (1863.01 GiB 2000.40 GB) > > Data Offset : 2048 sectors > > Super Offset : 8 sectors > > State : clean > > Device UUID : b713fd2b:eef145b0:ce91de0a:9077554b > > > > Update Time : Sat May 12 21:57:57 2012 > > Checksum : 80345876 - correct > > Events : 304581 > > > > Layout : left-symmetric > > Chunk Size : 512K > > > > Device Role : Active device 2 > > Array State : .AAAAAA ('A' == active, '.' == missing) > > Any ideas? > > Cheers,
Attachment:
signature.asc
Description: PGP signature