On Tue, 28 Dec 2010 08:49:52 +0000 "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: Neil Brown [mailto:neilb@xxxxxxx] > > Sent: Tuesday, December 28, 2010 3:24 AM > > To: Kwolek, Adam > > Cc: Wojcik, Krzysztof; linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; > > Ciechanowski, Ed; Neubauer, Wojciech > > Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover > > > > On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam" > > <adam.kwolek@xxxxxxxxx> > > wrote: > > > > > Hi, > > > > > > Please find below my few comments for current grow/reshape code on > > devel 3.2. > > > > > > In super-intel.c, update_reshape_container_disk update is defined. > > > It is intended to send for container only and operates on all devices > > /arrays/ in container in one shot. > > > This is IMSM incompatible way. IMSM updates disks list in metadata > > and first device information during first array reshape. > > > During second array reshape, device is updated only (using disk > > information updated during first reshape) > > > As it I am showing above. Container operation is in fact set of > > single array operations. This is due to single migration area (per > > container), > > > where backup is stored. It is related to currently reshaped device > > and we can have single array in reshape state only. > > > > > > This drives to conclusion that update should be issued per array and > > should have device id (dev minor or subdev). > > > > > > Considering above current implementation of > > update_reshape_container_disk is not compatible with imsm. > > > > > > This is driven by current reshape_container() implementation in > > Grow.c. > > > The flow goes as follow: > > > 1. reshape_container() > > > - metadata update for container (+ sync_metadata) > > > - call in loop reshape_array() for each array in container > > > > > > 2. reshape_array() > > > - adds disks for each device in container - but it should add > > disks to this particular array it is called for (loop is on > > reshape_container() already) > > > - reshape_array() - currently doesn't update metadata, if so - in > > line 1891 sync_metadata() seems to be not necessary /no reshape_super() > > call/ > > > - currently reshape_array() assumes that metadata is ready for all > > devices in container - this is not imsm compatible /as described above/ > > > > > > How mdadm should work to be imsm compatible: > > > 1. container operation is set of array operations (not atomic/single > > container update) > > > 2. for every array the same disk set should be used (it is covered in > > current devel 3.2 code but not in imsm compatible way) > > > 3. there is not metadata update after reshape_super() call for > > container. > > > 4. first array update should update disk list in metadata > > > 6. second array update should reuse (+verify) used disk list present > > in metadata already > > > 7. reshape cannot be run in parallel > > > > > > Main code changes I'm proposing? > > > Problem 1. > > > Main code in Grow.c cannot assume when whole metadata /all arrays > > in container/ will be updated for particular external meta > > > during reshape_super() in reshape_container(). It can issue no > > update to container/arrays/ due to compatibility reason. > > > Resolution 1. > > > reshape_super() has to be called for container and for every > > array in reshape_array() (i.e.) before new disks will be added to array > > (and after monitor is running). > > > We can distinguish calls by comparing (as currently), st- > > >container_dev and st->devnum. > > > This causes that in reshape_array(), spares cannot be added to > > all arrays in container at one time (based on metadata information), > > > but for current array only (all arrays will be covered by loop in > > reshape_container()). > > > This means also, that disks cannot be added based on metadata > > information without adding additional reshape_suped() call in > > reshape_array(), > > > as it cannot be assumed that all arrays in container are updated > > at this moment without this call. > > > > > > Problem 2. > > > As reshape_container() performs action for all arrays in > > container probably we should allow for code reuse from Grow.c by > > external meta. > > > Resolution 2. > > > In my implementation, I've did it by moving manage_reshape() call > > to the end of reshape processing. In this function I've manage array > > size and backward takeover to raid0 /optionally/ > > > I think that this would be correct for imsm, but to have > > flexibility manage_reshape() call should not be changed. In such > > situation additional function can be added: > > > i.e. finalize_reshape() for purposes I've pointed above (after > > wait_reshape() and before level change action, Grow.c:2045). > > > Action after manage_reshape() should be based on manage_reshape() > > on condition if it exists and is yes, on return code form this call. > > > This allows metadata specific code decide to reuse or not /or for > > some cases/ as much as it is possible of main reshape code. > > > > > > Problem 3. > > > After manage_reshape() call we are returning to main > > reshape_container() code, I assume that manage_reshape() should execute > > at some point fork() and return, > > > as code in Grow.c does. This can cause running parallel reshapes > > using i.e. single backup file. > > > Resolution 3. > > > Resolution will depend on particular metadata implementation, but > > as it is single backup_file used, I can assume that intention is > > sequential reshapes run. > > > This can be handled in reshape_super() by waiting for all arrays > > to be in idle state. This entry condition should be checked in > > reshape_super() called in reshape_container() > > > without waiting for state change, and in reshape_super() called > > in reshape_array() with waiting for idle state. > > > > > > > > > Considering all above I think that container reshape should work as > > follow (using current devel 3.2 code): > > > > > > reshape_container() > > > { > > > a) reshape_super(container context) > > > - check reshape condition and exit with error when any array is > > not ready for reshape > > > Checking condition here makes us safe that we can wait for > > required condition during reshape_super() call in array context later > > > - leaving this call here makes possible to implement container > > update also if metadata requires this > > > b) fork() if reshape_super() call succeed (+). > > > For multi-array container this is required to return to console > > for sequential work on arrays. > > > When we want to procced for second array, we have to wait until > > first reshape is finished. > > > This would block console. > > > c) call in loop for all devices reshape_array() > > > 1. analyse_change() > > > 2. change level and start monitor (if required) > > > 3. reshape_super(array context) (+) > > > - check reshape condition, if any array is in reshape state > > in container this time wait for reshape end. > > > - send metadata update and allow for reshape processing > > > 4. load metadata (container_content()) and based on this > > information add disks to md for single array (currently processed) > > > 5. set other array parameters to md (including raid_disks) > > > 6. start_reshape() > > > 7. manage_reshape()(+) -> if metadata requires take ownership of > > process or ... > > > 8. control reshape by Grow.c code using fork() > > > (possible that fork usage can be optional for container, as it > > is forked already) > > > - child_XXX() > > > - wait reshape > > > - finalize_reshape(for external metadata)(+) - we can > > handle size change in common code usage case > > > - backward takeover > > > 9. return to reshape_container() and process next array (c.) if > > any > > > } > > > > > > Things added to your current design (marked above with (+)): > > > 1. additional fork() > > > 2. additional reshape_super() in reshape_array() > > > 3. change in call to manage_reshape() to allow processing in main > > code for external metadata > > > Depends on: > > > - if manage_reshape() exists > > > - manage_reshape() return code > > > 4. added finalize_reshape() > > > > > > > > > Above flow gives possibility to: > > > 1. run reshapes in sequence or > > > 2. run reshapes in parallel > > > 3. reuse Grow.c code (we can use manage_reshape, finalize_reshape, > > code in Grow.c or any reasonable combination) > > > > > > and all this is up to metadata specific code > > (reshape_super()/manage_reshape()/finalize_reshape()). > > > > > > Please let me know what do you think about this. If you like it, I'll > > start work on code changes. > > > > > > BR > > > Adam > > > > > > > > > Hi Adam, > > thanks for you comments and the obvious effort you have put into > > understanding the new structure. > > > > I agree that my code is making some assumption that don't fit quite > > right > > with IMSM. However I think they can be resolved with relatively small > > changes. > > > > I don't think we should support parallel reshaping at all. md doesn't > > allow > > multiple arrays that share a device to be reshaped at the same time > > anyway, > > and running parallel reshapes would result in horrible performance. > > So we > > should only support multiple reshapes in series. Therefore on a > > single > > 'fork' call is needed. The forked child then handles all of the > > reshaping. > > > > I only want one call to ->reshape_super. It essentially tells the > > metadata > > handler that the user has requested a reshape, and as the user only > > made one > > request, only one call to ->reshape_super should be made. > > > > I had assume that the one call would update the metadata for all of > > the > > arrays, but you say that is incorrect for IMSM. In that case it should > > just > > update the metadata for the first array. When that reshape completes, > > the > > metadata will be told (in ->set_array_state) and it should at that > > point > > update the metadata to reflect that the second array is undergoing > > reshape. > > > > This requires slightly different behaviour in reshape_container. > > Instead of > > calling ->container_content just once and then reshaping each array it > > should > > repeatedly: > > - load the metadata > > - call ->container_content, find the first array which is ready for > > reshape > > - process that reshape so that it completes > > - ping_monitor to ensure that the metadata is updated > > That should be quite general and compatible with IMSM. > > > > I am not entirely sure that you wanted finalize_reshape to do. If it > > was to > > update the metadata to show that the reshape was complete, then that > > should > > be done in ->set_array_state. If it was to e.g. convert a RAID4 back > > to > > RAID0, that is already done towards the end of reshape_array. > > > > Thanks, > > NeilBrown > > Hi, > > Your approach looks ok, and should lead us to imsm compatible implementation. > > The only problem I can see is that set_array_state is called for particular array and I would avoid modifying another array metadata information ther. I can see why you might want to avoid updating one array from a call to update a different array, but I think it is the correct thing to do in this case. What we are really doing here is changing the whole container (both arrays). We are making them both bigger in number of devices. The rules for updating the metadata require that this happens in multiple steps and at different times. One of the steps is that when the first array finishes a reshape, the next array can start. So in order to impose the container-wide change, we need to make a change to the second array when finalising the reshape of the first. I'm not sure if I've said that as clearly as I would like to, but the point remains: I really think preparing the second array for reshape by a set_array_state call on the first array is the right thing to do. > This means that sequence of set_array_state is required to archieve final/required metadata state. > > so sequence can be: > 1. mdadm: ping_monitor > 2. monitor (in loop for active_array): set_array_state(a1: finalize current reshape), set_array_state(a2: start new reshape in metadata, when previous is finished) So this will be a single set_array_state call. IMSM *knows* that when the first array has been reshaped to more devices, the second array must also be reshaped, so it will make that change transparently. > > If mdadm will not follow arrays order in mdmon we will have situation: > 1. mdadm: ping_monitor > 2. monitor (in loop for active_array): set_array_state(a2: nothing to do, a1 is under reshape), set_array_state(a1: finalize current reshape) > > This leads me to 2 pings sent by mdadm: > 1. mdadm: ping_monitor > 2. monitor (in loop for active_array): set_array_state(a2: nothing to do, a1 is under reshape), set_array_state(a1: finalize current reshape) > 3. mdadm: ping_monitor > 4. monitor (in loop for active_array): set_array_state(a2: start new reshape in metadata, when previous is finished), set_array_state(a1: nothing to do) > > If there is more arrays in container (not imsm case) it is important that first ping finalizes current reshape and second ping starts new one. Only one ping should be needed. > > Regarding finalize_reshape(): If metadata specific implementation is similar or the same as current reshape_array() implementation, there is no need to duplicate it in manage_reshape(). We can proceed in main code, but at the end, array size has to be updated for external metadata (md has been set to external size management). This is main purpose of finalize_reshape(), as metadata specific code can read new size from metadata and it can set it in md. I hadn't thought about changing the array size. Thanks for remembering it. I think the common code should do this. When the reshape completes mdadm should re-read the metadata and use container_content to find out the expected size of the array, and then tell md what this size is. So reshape_array will ping_monitor and re-read the metadata. That means that reshape_container doesn't need to do either of these. After calling reshape_array on one array, it will call container_content(st, NULL) and look to see if any other arrays need to be reshaped. > > If you state that reshape control for external metadata always has to be put in to manage_reshape(), we can forget about finalize_reshape(). > I'm not exactly sure I know what you mean by that first part, but I think the answer is yet: I do want any external metadata handler to have a manage_reshape, and I don't want to have a finalize_reshape at all. Thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html