RE: [PATCH 00/29] OLCE, migrations and raid10 takeover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Neil Brown [mailto:neilb@xxxxxxx]
> Sent: Wednesday, December 29, 2010 11:34 AM
> To: Kwolek, Adam
> Cc: Wojcik, Krzysztof; linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J;
> Ciechanowski, Ed; Neubauer, Wojciech
> Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
>
> On Tue, 28 Dec 2010 08:49:52 +0000 "Kwolek, Adam"
> <adam.kwolek@xxxxxxxxx>
> wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Neil Brown [mailto:neilb@xxxxxxx]
> > > Sent: Tuesday, December 28, 2010 3:24 AM
> > > To: Kwolek, Adam
> > > Cc: Wojcik, Krzysztof; linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J;
> > > Ciechanowski, Ed; Neubauer, Wojciech
> > > Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
> > >
> > > On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam"
> > > <adam.kwolek@xxxxxxxxx>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Please find below my few comments for current grow/reshape code
> on
> > > devel 3.2.
> > > >
> > > > In super-intel.c, update_reshape_container_disk update is
> defined.
> > > > It is intended to send for container only and operates on all
> devices
> > > /arrays/ in container in one shot.
> > > > This is IMSM incompatible way. IMSM updates disks list in
> metadata
> > > and first device information during first array reshape.
> > > > During second array reshape, device is updated only (using disk
> > > information updated during first reshape)
> > > > As it I am showing above. Container operation is in fact set of
> > > single array operations. This is due to single migration area (per
> > > container),
> > > > where backup is stored. It is related to currently reshaped
> device
> > > and we can have single array in reshape state only.
> > > >
> > > > This drives to conclusion that update should be issued per array
> and
> > > should have device id (dev minor or subdev).
> > > >
> > > > Considering above current implementation of
> > > update_reshape_container_disk is not compatible with imsm.
> > > >
> > > > This is driven by current reshape_container() implementation in
> > > Grow.c.
> > > > The flow goes as follow:
> > > > 1. reshape_container()
> > > >   - metadata update for container (+ sync_metadata)
> > > >   - call in loop reshape_array() for each array in container
> > > >
> > > > 2. reshape_array()
> > > >    - adds disks for each device in container - but it should add
> > > disks to this particular array it is called for (loop is on
> > > reshape_container() already)
> > > >    - reshape_array() - currently doesn't update metadata, if so -
> in
> > > line 1891 sync_metadata() seems to be not necessary /no
> reshape_super()
> > > call/
> > > >    - currently reshape_array() assumes that metadata is ready for
> all
> > > devices in container - this is not imsm compatible /as described
> above/
> > > >
> > > > How mdadm should work to be imsm compatible:
> > > > 1. container operation is set of array operations (not
> atomic/single
> > > container update)
> > > > 2. for every array the same disk set should be used (it is
> covered in
> > > current devel 3.2 code but not in imsm compatible way)
> > > > 3. there is not metadata update after reshape_super() call for
> > > container.
> > > > 4. first array update should update disk list in metadata
> > > > 6. second array update should reuse (+verify) used disk list
> present
> > > in metadata already
> > > > 7. reshape cannot be run in parallel
> > > >
> > > > Main code changes I'm proposing?
> > > > Problem 1.
> > > >         Main code in Grow.c cannot assume when whole metadata /all
> arrays
> > > in container/ will be updated for particular external meta
> > > >         during reshape_super() in reshape_container(). It can issue
> no
> > > update to container/arrays/ due to compatibility reason.
> > > > Resolution 1.
> > > >         reshape_super() has to be called for container and for
> every
> > > array in reshape_array() (i.e.) before new disks will be added to
> array
> > > (and after monitor is running).
> > > >         We can distinguish calls by comparing (as currently), st-
> > > >container_dev and st->devnum.
> > > >         This causes that in reshape_array(), spares cannot be added
> to
> > > all arrays in container at one time (based on metadata
> information),
> > > >         but for current array only (all arrays will be covered by
> loop in
> > > reshape_container()).
> > > >         This means also, that disks cannot be added based on
> metadata
> > > information without adding additional reshape_suped() call in
> > > reshape_array(),
> > > >         as it cannot be assumed that all arrays in container are
> updated
> > > at this moment without this call.
> > > >
> > > > Problem 2.
> > > >         As reshape_container() performs action for all arrays in
> > > container probably we should allow for code reuse from Grow.c by
> > > external meta.
> > > > Resolution 2.
> > > >         In my implementation, I've did it by moving
> manage_reshape() call
> > > to the end of reshape processing. In this function I've manage
> array
> > > size and backward takeover to raid0 /optionally/
> > > >         I think that this would be correct for imsm, but to have
> > > flexibility manage_reshape() call should not be changed. In such
> > > situation additional function can be added:
> > > >         i.e. finalize_reshape() for purposes I've pointed above
> (after
> > > wait_reshape() and before level change action, Grow.c:2045).
> > > >         Action after manage_reshape() should be based on
> manage_reshape()
> > > on condition if it exists and is yes, on return code form this
> call.
> > > >         This allows metadata specific code decide to reuse or not
> /or for
> > > some cases/ as much as it is possible of main reshape code.
> > > >
> > > > Problem 3.
> > > >         After manage_reshape() call we are returning to main
> > > reshape_container() code, I assume that manage_reshape() should
> execute
> > > at some point fork() and return,
> > > >         as code in Grow.c does. This can cause running parallel
> reshapes
> > > using i.e. single backup file.
> > > > Resolution 3.
> > > >         Resolution will depend on particular metadata
> implementation, but
> > > as it is single backup_file used, I can assume that intention is
> > > sequential reshapes run.
> > > >         This can be handled in reshape_super() by waiting for all
> arrays
> > > to be in idle state. This entry condition should be checked in
> > > reshape_super() called in reshape_container()
> > > >         without waiting for state change, and in reshape_super()
> called
> > > in reshape_array() with waiting for idle state.
> > > >
> > > >
> > > > Considering all above I think that container reshape should work
> as
> > > follow (using current devel 3.2 code):
> > > >
> > > > reshape_container()
> > > > {
> > > >   a) reshape_super(container context)
> > > >         - check reshape condition and exit with error when any
> array is
> > > not ready for reshape
> > > >           Checking condition here makes us safe that we can wait
> for
> > > required condition during reshape_super() call in array context
> later
> > > >         - leaving this call here makes possible to implement
> container
> > > update also if metadata requires this
> > > >   b) fork() if reshape_super() call succeed (+).
> > > >         For multi-array container this is required to return to
> console
> > > for sequential work on arrays.
> > > >         When we want to procced for second array, we have to wait
> until
> > > first reshape is finished.
> > > >         This would block console.
> > > >   c) call in loop for all devices reshape_array()
> > > >         1. analyse_change()
> > > >         2. change level and start monitor (if required)
> > > >         3. reshape_super(array context) (+)
> > > >                 - check reshape condition, if any array is in reshape
> state
> > > in container this time wait for reshape end.
> > > >                 - send metadata update and allow for reshape
> processing
> > > >         4. load metadata (container_content()) and based on this
> > > information add disks to md for single array (currently processed)
> > > >         5. set other array parameters to md (including raid_disks)
> > > >         6. start_reshape()
> > > >         7. manage_reshape()(+) -> if metadata requires take
> ownership of
> > > process or ...
> > > >         8. control reshape by Grow.c code using fork()
> > > >            (possible that fork usage can be optional for container,
> as it
> > > is forked already)
> > > >                 - child_XXX()
> > > >                 - wait reshape
> > > >                 - finalize_reshape(for external metadata)(+) - we can
> > > handle size change in common code usage case
> > > >                 - backward takeover
> > > >         9. return to reshape_container() and process next array
> (c.) if
> > > any
> > > > }
> > > >
> > > > Things added to your current design (marked above with (+)):
> > > >         1. additional fork()
> > > >         2. additional reshape_super() in reshape_array()
> > > >         3. change in call to manage_reshape() to allow processing
> in main
> > > code for external metadata
> > > >                 Depends on:
> > > >                         - if manage_reshape() exists
> > > >                         - manage_reshape() return code
> > > >         4. added finalize_reshape()
> > > >
> > > >
> > > > Above flow gives possibility to:
> > > > 1. run reshapes in sequence or
> > > > 2. run reshapes in parallel
> > > > 3. reuse Grow.c code (we can use manage_reshape,
> finalize_reshape,
> > > code in Grow.c or any reasonable combination)
> > > >
> > > >         and all this is up to metadata specific code
> > > (reshape_super()/manage_reshape()/finalize_reshape()).
> > > >
> > > > Please let me know what do you think about this. If you like it,
> I'll
> > > start work on code changes.
> > > >
> > > > BR
> > > > Adam
> > > >
> > >
> > >
> > > Hi Adam,
> > >  thanks for you comments and the obvious effort you have put into
> > >  understanding the new structure.
> > >
> > >  I agree that my code is making some assumption that don't fit
> quite
> > > right
> > >  with IMSM.  However I think they can be resolved with relatively
> small
> > >  changes.
> > >
> > >  I don't think we should support parallel reshaping at all.  md
> doesn't
> > > allow
> > >  multiple arrays that share a device to be reshaped at the same
> time
> > > anyway,
> > >  and running parallel reshapes would result in horrible
> performance.
> > > So we
> > >  should only support multiple reshapes in series.  Therefore on a
> > > single
> > >  'fork' call is needed.  The forked child then handles all of the
> > > reshaping.
> > >
> > >  I only want one call to ->reshape_super.  It essentially tells the
> > > metadata
> > >  handler that the user has requested a reshape, and as the user
> only
> > > made one
> > >  request, only one call to ->reshape_super should be made.
> > >
> > >  I had assume that the one call would update the metadata for all
> of
> > > the
> > >  arrays, but you say that is incorrect for IMSM. In that case it
> should
> > > just
> > >  update the metadata for the first array.  When that reshape
> completes,
> > > the
> > >  metadata will be told (in ->set_array_state) and it should at that
> > > point
> > >  update the metadata to reflect that the second array is undergoing
> > > reshape.
> > >
> > >  This requires slightly different behaviour in reshape_container.
> > > Instead of
> > >  calling ->container_content just once and then reshaping each
> array it
> > > should
> > >  repeatedly:
> > >     - load the metadata
> > >     - call ->container_content, find the first array which is ready
> for
> > >       reshape
> > >     - process that reshape so that it completes
> > >     - ping_monitor to ensure that the metadata is updated
> > >  That should be quite general and compatible with IMSM.
> > >
> > >  I am not entirely sure that you wanted finalize_reshape to do.  If
> it
> > > was to
> > >  update the metadata to show that the reshape was complete, then
> that
> > > should
> > >  be done in ->set_array_state.  If it was to e.g. convert a RAID4
> back
> > > to
> > >  RAID0, that is already done towards the end of reshape_array.
> > >
> > > Thanks,
> > > NeilBrown
> >
> > Hi,
> >
> > Your approach looks ok, and should lead us to imsm compatible
> implementation.
> >
> > The only problem I can see is that set_array_state is called for
> particular array and I would avoid modifying another array metadata
> information ther.
>
> I can see why you might want to avoid updating one array from a call to
> update a different array, but I think it is the correct thing to do in
> this
> case.
>
> What we are really doing here is changing the whole container (both
> arrays).
> We are making them both bigger in number of devices.
> The rules for updating the metadata require that this happens in
> multiple
> steps and at different times.  One of the steps is that when the first
> array
> finishes a reshape, the next array can start.  So in order to impose
> the
> container-wide change, we need to make a change to the second array
> when
> finalising the reshape of the first.
> I'm not sure if I've said that as clearly as I would like to, but the
> point
> remains:  I really think preparing the second array for reshape by a
> set_array_state call on the first array is the right thing to do.
>
>
> > This means that sequence of set_array_state is required to archieve
> final/required metadata state.
> >
> > so sequence can be:
> >     1. mdadm: ping_monitor
> >     2. monitor (in loop for active_array): set_array_state(a1:
> finalize current reshape), set_array_state(a2: start new reshape in
> metadata, when previous is finished)
>
> So this will be a single set_array_state call.  IMSM *knows* that when
> the
> first array has been reshaped to more devices, the second array must
> also be
> reshaped, so it will make that change transparently.
>
> >
> > If mdadm will not follow arrays order in mdmon we will have
> situation:
> >     1. mdadm: ping_monitor
> >     2. monitor (in loop for active_array): set_array_state(a2:
> nothing to do, a1 is under reshape), set_array_state(a1: finalize
> current reshape)
> >
> > This leads me to 2 pings sent by mdadm:
> >     1. mdadm: ping_monitor
> >     2. monitor (in loop for active_array): set_array_state(a2:
> nothing to do, a1 is under reshape), set_array_state(a1: finalize
> current reshape)
> >     3. mdadm: ping_monitor
> >     4. monitor (in loop for active_array): set_array_state(a2: start
> new reshape in metadata, when previous is finished),
> set_array_state(a1: nothing to do)
> >
> > If there is more arrays in container (not imsm case) it is important
> that first ping finalizes current reshape and second ping starts new
> one.
>
> Only one ping should be needed.
>
> >
> > Regarding finalize_reshape(): If metadata specific implementation is
> similar or the same as current reshape_array() implementation, there is
> no need to duplicate it in manage_reshape(). We can proceed in main
> code, but at the end, array size has to be updated for external
> metadata (md has been set to external size management). This is main
> purpose of finalize_reshape(), as metadata specific code can read new
> size from metadata and it can set it in md.
>
> I hadn't thought about changing the array size.  Thanks for remembering
> it.
>
> I think the common code should do this.  When the reshape completes
> mdadm
> should re-read the metadata and use container_content to find out the
> expected size of the array, and then tell md what this size is.
>
> So reshape_array will ping_monitor and re-read the metadata.  That
> means that
> reshape_container doesn't need to do either of these.  After calling
> reshape_array on one array, it will call container_content(st, NULL)
> and look
> to see if any other arrays need to be reshaped.
>
> >
> > If you state that reshape control for external metadata always has to
> be put in to manage_reshape(), we can forget about finalize_reshape().
> >
>
> I'm not exactly sure I know what you mean by that first part, but I
> think the
> answer is yet:  I do want any external metadata handler to have a
> manage_reshape, and I don't want to have a finalize_reshape at all.
>
> Thanks,
> NeilBrown


Thank you for information, I think I know everything what I need at this moment.
In a few days, you'll get new patches for OLCE.

BR
Adam

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux