On Thu, 17 Jun 2010 10:40:36 +0100 "Trela, Maciej" <Maciej.Trela@xxxxxxxxx> wrote: > > > > > > > > Another thing is waiting during reshape for metadata update on > > MD_CHANGE_DEVS flag. > > > To roll reshape I've added the following code (instead calling > > md_ubdate_sb()): > > > > Yes, there is a real issue there... > > > > I don't think we ever need the kernel to wait for an external metadata > > handler > > to respond to device changes (apart from failure which is handled > > separately). > > So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with > > if (mddev->persistent) > > > > I think that would be best, but I've make a note to review that later. > > > > Neil, > from what I see in the raid5.c/md.c "native" code uses MD_CHANGE_DEVS > during the reshape if it reaches special points when metadata > write is really needed to update the reshape checkpoint. > In reshape_request(): > /* Cannot proceed until we've updated the superblock */ > .. > set_bit(MD_CHANGE_DEVS, mddev->flags) > > In md_check_recovery() we have: > if (mddev->flags) > md_update_sb() > > Couldn't we follow this logic with MD_CHANGE_DEVS for external metadata? > If not, how to detect the need for migration checkpoint update? Good question. The first question to ask is How does mdmon know when a metadata update is required, and how does it tell md that the metadata update is complete. OK, 2 first questions... For the first I suspect it should watch 'md/reshape_position' (which need to use sysfs_notify for). For the second .... I don't know. - Maybe sync_action could change to 'paused' and mdmon writes 'continue'.... but that is possibly overloading that file too much. - We could have a new sysfs file which just shows paused/active ?? - We could require that mdmon sets 'sync_max' appropriately so that reshape will stop at the right place, and then when mdmon has updated the metadata, it sets a new sync_max value. - As above, but if sync_max is set too high, it is automatically reduced to the place when raid5 finds that it has to stop I think the last one is probably best. Before updating ->reshape_position, raid5 checks ->resync_max and if it is too high for safety it set is lower to a safer value. Then it changes ->reshape_position and calls sysfs_notify. mdmon watches for 'reshape_postion' to change. when it does it updates the metadata and then writes a larger value to ->resync_max. Things can get a little confusing when reshaping to fewer devices as reshape_position decreases, but sync_completed always increases and sync_max is still an 'upper' limit. But it should work OK. Does that seem reasonable? NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html