On Tue, 15 Dec 2009 11:03:06 -0700 Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Mon, Dec 14, 2009 at 9:19 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > On second thought, if we get to activate_spare() it's already too > > late. Moving this to mdadm at assembly time (prior to setting > > readonly) is a better approach. > > > > Problem. slot_store() in the array inactive case currently does: > > /* assume it is working */ > clear_bit(Faulty, &rdev->flags); > clear_bit(WriteMostly, &rdev->flags); > set_bit(In_sync, &rdev->flags); > sysfs_notify_dirent(rdev->sysfs_state); > > i.e. sets the disk insync even if we specified a recovery_start < > MaxSector. If userspace can guarantee that the array stays inactive > then it can write to 'recovery_start' after 'slot' and catch attempts > to cold_add() out-of-sync disks on pre-2.6.33 kernels, but that gives > a window of invalid configuration. The other fix is to remove the > set_bit(In_sync), and then for the pre-2.6.33 case userspace would > need to disallow adding out-of-sync disks and force them through the > hot_add() case. This is how mdadm/mdmon currently operates, but that > is a surprising ABI quirk when switching to/from 2.6.33. A third > option is to allow recovery_start_store to be modified while the array > is read only. Although not my favorite, because it requires tricky > mdmon logic to catch activate_spare() attempts before the monitor > thread starts touching the array, it has the benefit of not changing > any old behavior and no window of invalid configuration. Thoughts?? I'm tempted to wait a bit longer and see if you find a solution, as you seem to be progressing quite well :-) But I won't. I imagine there are two cases: 1/ assembling an array from devices some of which might be partially recovered, 2/ re-adding a device to an array which is already active. In the first case, mdadm would: - add the disk (write to new_dev) - set the slot - this sets 'In_sync' - set the recovery_start - this clears 'In_sync' as required. In the second case either mdadm or mdmon would: - write 'frozen' to sync_action, which would inhibit any call to remove_and_add_spares - add the disk - set recovery_start - set the slot - write 'recover' to sync_action It is unfortunate that the setting of 'slot' and 'recovery_start' must be in different orders in the different cases, but maybe that isn't a tragedy. Possibly I could change slot_store in the pers==NULL case to not set In_sync if recovery_offset were not MaxSector, but I'm not sure it is worth the effort. Does that answer your concerns? NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html