On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie" <Annemarie.Schmidt@xxxxxxxxxxx> wrote: > Hi! > > I have a 2 disk raid1 data array. As a result of other testing, the device info > in the superblock for one of the partners, /dev/sdc2, ended up being in slot 3 > of the device info array: > > [root@typhon ~]# mdadm --detail /dev/md21 > /dev/md21: > Version : 1.2 > Creation Time : Mon May 9 11:19:43 2011 > Raid Level : raid1 > Array Size : 5241844 (5.00 GiB 5.37 GB) > Used Dev Size : 5241844 (5.00 GiB 5.37 GB) > Raid Devices : 2 > Total Devices : 2 > Persistence : Superblock is persistent > > Intent Bitmap : Internal > > Update Time : Thu May 12 15:51:50 2011 > State : active > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Name : typhon.mno.stratus.com:21 (local to host typhon.mno.stratus.com) > UUID : 996d993f:baac367a:8b154ba9:43e56cff > Events : 687 > > Number Major Minor RaidDevice State > --> 3 65 34 0 active sync /dev/sdc2 > 2 65 82 1 active sync /dev/sdk2 > > When I remove /dev/sdk2 and then a re-add it back in, the re-add fails: > > >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2 > mdadm: set /dev/sdk2 faulty in /dev/md21 > mdadm: hot removed /dev/sdk2 from /dev/md21 > > >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2 > mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a --re-add > fails. > mdadm: not performing --add as that would convert /dev/sdk2 in to a spare. > mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" first. > > I believe the re-add fails because the enough_fd function (util.c) is not searching deep enough into the > dev_info array with this line of code: > for (i=0; i<array.raid_disks + array.nr_disks; i++) > > array.raids_disk = 2 and array/nr_disks = 1, and so for this particular md device, it is only looking at slots 0-2. > I believe the code needs to be changed to look at all possible dev_info array slots, taking into account the > version of the superblock (like the Detail function does (Detail.c). > > Do folks agree? > I do - largely. I think there might be a better more general way to control the loop though. Could you try this please? Thanks, NeilBrown diff --git a/util.c b/util.c index 1056ae4..d005e0a 100644 --- a/util.c +++ b/util.c @@ -370,10 +370,14 @@ int enough_fd(int fd) array.raid_disks <= 0) return 0; avail = calloc(array.raid_disks, 1); - for (i=0; i<array.raid_disks + array.nr_disks; i++) { + for (i=0; i < 1024 && array.raid_disks > 0; i++) { disk.number = i; if (ioctl(fd, GET_DISK_INFO, &disk) != 0) continue; + if (disk.major == 0 && disk.minor == 0) + continue; + array.raid_disks--; + if (! (disk.state & (1<<MD_DISK_SYNC))) continue; if (disk.raid_disk < 0 || disk.raid_disk >= array.raid_disks) -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html