Hi Neil, 1) after looking at this more I see the following: on ADD_NEW_DISK sequence (to a running array, not during assembly) in the kernel, the desc_nr is loaded from the superblock of the disk to be added in super_1_load(): if (sb->level == cpu_to_le32(LEVEL_MULTIPATH)) rdev->desc_nr = -1; else rdev->desc_nr = le32_to_cpu(sb->dev_number); And later, in bind_rdev_to_array() it is indeed checked that desc_nr is unique. However, at least for 1.2 arrays, I believe this is too restrictive, don't you think? If the raid slot (not desc_nr) of the device being re-added is *not occupied* yet, can't we just select a free desc_nr for the new disk on that path? Or perhaps, mdadm on the re-add path can select a free desc_nr (disc.number) for it (just as it does for --add), after ensuring that the slot is not occupied yet? Where it is better to do it? Otherwise, the re-add fails, while it can perfectly succeed (only pick a different desc_nr). 2) About your suggestion of setting the In_sync flag via sysfs, I think it's too dangerous to blindly set it, so I am not going to do it. Let the kernel look at event count and decide. 3) So it looks like at present, my best way to add (not re-add) a disk into a specific slot is the following: - borrow some code from mdadm to initialize the superblock of the new disk, as belonging to this array (init_super). - use the sysfs to do what you suggested before: echo frozen > sync_action echo $major:$minor > new_dev echo $slot > dev-$DEVNAME/slot echo idle > sync_action Does this makes sense? Thanks! Alex. On Wed, Nov 2, 2011 at 12:52 AM, NeilBrown <neilb@xxxxxxx> wrote: > On Tue, 1 Nov 2011 18:26:14 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> > wrote: > >> Hi Neil, >> I hope you don't find my questions too pesky; I hope as time goes, I >> will be able to contribute as well, and not just ask for information. > > Enquiring minds should always be encouraged! > >> >> What you suggested works great, but only if the drive already has a >> valid superblock. This is according to md_import_device(), which calls >> super_1_load(). Although it calls it with NULL refdev, but it still >> performs basic validity checks of the superblock. So it is needed to >> have functionality of write_init_super() of mdadm (perhaps can be an >> mdadm operation, not sure how dangerous this is to expose such >> functionality). > > Correct. With v1.x metadata - and even more so with DDF and IMSM - the > kernel does not know everything about the metadata and cannot create new > superlocks. At best it can edit what is there. > > As for adding things to mdadm - you just need a genuine use-case. > I can see it could be valid to add a '--role' option to --add so you can add > a device to a specific slot. I doubt it would be used much, but if there was > a concrete need I wouldn't object. > > >> >> Another question I was digging for, is to find a spot where kernel >> determines whether it is going to do a bitmap-based reconstruction or >> a full reconstruction. Can you verify that I found it correctly, or I >> am way off? >> >> In super_1_validate(): >> if (ev1 < mddev->bitmap->events_cleared) >> return 0; >> This means that the array bitmap was cleared after the drive was >> detached, so we bail right out, and do not set rdev->raid_disk, and >> leave all flags off. In that case, eventually, we will need full >> reconstruction. >> Otherwise, after we decide that this drive is an active device (not >> failed or spare): >> if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RECOVERY_OFFSET)) >> rdev->recovery_offset = le64_to_cpu(sb->recovery_offset); >> else >> set_bit(In_sync, &rdev->flags); >> Here we might or might not set the In_sync flag. >> > > Yes. 'In_sync' effectively means that recovery_offset == MAX. > > >> Then in slot_store() and in add_new_disk(), we set the important flag >> for that matter: >> if (test_bit(In_sync, &rdev->flags)) >> rdev->saved_raid_disk = slot; >> else >> rdev->saved_raid_disk = -1; >> >> And later, in hot_add_disk (like raid5_add_disk, raid1_add_disk), we >> check this flag and set/not set the fullsync flag. And in >> raid1_add_disk, like you said, the slot doesn't matter, any valid >> saved_raid_disk is good. >> >> Is this correct? If yes, then in the sequence you suggested, we will >> always do full reconstruction. Because new_dev_store() and >> slot_store() sequence do not call validate_super(), so In_sync is >> never set, so saved_raid_disk remains -1. This is perfectly fine. > > Yes, this is all correct. > In the sequence I suggested you could add > echo insync > dev-$DEVNAME/state > before > echo $slot > dev-$DEVNAME/slot > > and it would result in saved_raid_disk being set. > > NeilBrown > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html