This patch has resolved the immediate issue I was having on 2.6.18 with
RAID10. Previous to this change, after removing a device from the array
(with mdadm --remove), physically pulling the device and
changing/re-inserting, the "Number" of the new device would be
incremented on top of the highest-present device in the array. Now, it
resumes its previous place.
Does this look to be 'correct' output for a 14-drive array, which dev 8
was failed/removed from then "add"'ed? I'm trying to determine why the
device doesn't get pulled back into the active configuration and
re-synced. Any comments?
Thanks!
/eli
For example, currently when device dm-8 is removed it shows up like this:
Number Major Minor RaidDevice State
0 253 0 0 active sync /dev/dm-0
1 253 1 1 active sync /dev/dm-1
2 253 2 2 active sync /dev/dm-2
3 253 3 3 active sync /dev/dm-3
4 253 4 4 active sync /dev/dm-4
5 253 5 5 active sync /dev/dm-5
6 253 6 6 active sync /dev/dm-6
7 253 7 7 active sync /dev/dm-7
8 0 0 8 removed
9 253 9 9 active sync /dev/dm-9
10 253 10 10 active sync /dev/dm-10
11 253 11 11 active sync /dev/dm-11
12 253 12 12 active sync /dev/dm-12
13 253 13 13 active sync /dev/dm-13
8 253 8 - spare /dev/dm-8
Previously however, it would come back with the "Number" as 14, not 8 as
it should. Shortly thereafter things got all out of whack, in addition
to just not working properly :) Now I've just got to figure out how to
get the re-introduced drive to participate in the array again like it
should.
Eli Stair wrote:
I'm actually seeing similar behaviour on RAID10 (2.6.18), where after
removing a drive from an array re-adding it sometimes results in it
still being listed as a faulty-spare and not being "taken" for resync.
In the same scenario, after swapping drives, doing a fail,remove, then
an 'add' doesn't work, only a re-add will even get the drive listed by
MDADM.
What's the failure mode/symptoms that this patch is resolving?
Is it possible this affects the RAID10 module/mode as well? If not,
I'll start a new thread for that. I'm testing this patch to see if it
does remedy the situation on RAID10, and will update after some
significant testing.
/eli
NeilBrown wrote:
> There is a nasty bug in md in 2.6.18 affecting at least raid1.
> This fixes it (and has already been sent to stable@xxxxxxxxxx).
>
> ### Comments for Changeset
>
> This fixes a bug introduced in 2.6.18.
>
> If a drive is added to a raid1 using older tools (mdadm-1.x or
> raidtools) then it will be included in the array without any resync
> happening.
>
> It has been submitted for 2.6.18.1.
>
>
> Signed-off-by: Neil Brown <neilb@xxxxxxx>
>
> ### Diffstat output
> ./drivers/md/md.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff .prev/drivers/md/md.c ./drivers/md/md.c
> --- .prev/drivers/md/md.c 2006-09-29 11:51:39.000000000 +1000
> +++ ./drivers/md/md.c 2006-10-05 16:40:51.000000000 +1000
> @@ -3849,6 +3849,7 @@ static int hot_add_disk(mddev_t * mddev,
> }
> clear_bit(In_sync, &rdev->flags);
> rdev->desc_nr = -1;
> + rdev->saved_raid_disk = -1;
> err = bind_rdev_to_array(rdev, mddev);
> if (err)
> goto abort_export;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html