Thanks Neil,I just gave this patched module a shot on four systems. So far, I haven't seen the device number inappropriately increment, though as per a mail I sent a short while ago that seemed remedied by using the 1.2 superblock, for some reason. However, it appears to have introduced a new issue, and another is unresolved by it:
// BUG 1The single-command syntax to fail and remove a drive is still failing, I do not know if this is somehow contributing to the further (new) issues below:
[root@gtmp06 tmp]# mdadm /dev/md0 --fail /dev/dm-0 --remove /dev/dm-0 mdadm: set /dev/dm-0 faulty in /dev/md0 mdadm: hot remove failed for /dev/dm-0: Device or resource busy [root@gtmp06 tmp]# mdadm /dev/md0 --remove /dev/dm-0 mdadm: hot removed /dev/dm-0 // BUG 2Now, upon adding or re-adding a "fail...remove"'d drive, it is not used for resync. I realized previously that added drives weren't re-synced until the existing array build was done, then they were grabbed. This however is a clean/active array that is rejecting the drive.
I've performed this identically on both a clean & active array, as well as a newly-created (resync'ing) array, to the same effect. Even after rebuild or reboot, the removed drive isn't taken back and remains listed as a "faulty spare", with dmesg indicating that it is "non-fresh".
// DMESG: md: kicking non-fresh dm-0 from array! // ARRAY status 'mdadm -D /dev/md0' State : active, degraded Active Devices : 13 Working Devices : 13 Failed Devices : 1 Spare Devices : 0 Layout : near=1, offset=2 Chunk Size : 512K Name : 0 UUID : 05c2faf4:facfcad3:ba33b140:100f428a Events : 22 Number Major Minor RaidDevice State 0 253 1 0 active sync /dev/dm-1 1 253 2 1 active sync /dev/dm-2 2 253 5 2 active sync /dev/dm-5 3 253 4 3 active sync /dev/dm-4 4 253 6 4 active sync /dev/dm-6 5 253 3 5 active sync /dev/dm-3 6 253 13 6 active sync /dev/dm-13 7 0 0 7 removed 8 253 7 8 active sync /dev/dm-7 9 253 8 9 active sync /dev/dm-8 10 253 9 10 active sync /dev/dm-9 11 253 11 11 active sync /dev/dm-11 12 253 10 12 active sync /dev/dm-10 13 253 12 13 active sync /dev/dm-12 7 253 0 - faulty spare /dev/dm-0Let me know what more I can do to help track this down. I'm reverting this patch, since it is behaving less-well than before. Will be happy to try others.
Attached are typescript of the drive remove/add sessions and all output. /eli Neil Brown wrote:
On Friday October 6, estair@xxxxxxx wrote: > > This patch has resolved the immediate issue I was having on 2.6.18 with > RAID10. Previous to this change, after removing a device from the array > (with mdadm --remove), physically pulling the device and > changing/re-inserting, the "Number" of the new device would be > incremented on top of the highest-present device in the array. Now, it > resumes its previous place. > > Does this look to be 'correct' output for a 14-drive array, which dev 8 > was failed/removed from then "add"'ed? I'm trying to determine why the > device doesn't get pulled back into the active configuration and > re-synced. Any comments? Does this patch help? Fix count of degraded drives in raid10. Signed-off-by: Neil Brown <neilb@xxxxxxx> ### Diffstat output ./drivers/md/raid10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c --- .prev/drivers/md/raid10.c 2006-10-09 14:18:00.000000000 +1000 +++ ./drivers/md/raid10.c 2006-10-05 20:10:07.000000000 +1000 @@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev) disk = conf->mirrors + i; if (!disk->rdev || - !test_bit(In_sync, &rdev->flags)) { + !test_bit(In_sync, &disk->rdev->flags)) { disk->head_position = 0; mddev->degraded++; } NeilBrown
Attachment:
gzKY3Inxrxoy.gz
Description: GNU Zip compressed data
Attachment:
gzHKRDSqUyeA.gz
Description: GNU Zip compressed data