RE: Suspend_hi mamagment during reshape

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Kwolek, Adam
> Sent: Thursday, December 09, 2010 5:00 PM
> To: Neil Brown
> Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed
> Subject: RE: Suspend_hi mamagment during reshape
> 
> 
> 
> > -----Original Message-----
> > From: Neil Brown [mailto:neilb@xxxxxxx]
> > Sent: Thursday, December 09, 2010 11:28 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed
> > Subject: Re: Suspend_hi mamagment during reshape
> >
> > On Thu, 9 Dec 2010 08:42:35 +0000 "Kwolek, Adam"
> > <adam.kwolek@xxxxxxxxx>
> > wrote:
> >
> > > Hi,
> > >
> > > I've got a problem with suspend_hi management during check-
> pointing,
> > as we discuss this a while ago.
> > >
> > > Currently, I've corrected check-pointing in the way that mdmon sets
> > suspend_hi to the place that sync_max is set in current pass to guard
> > access.
> > > This assumption looks for me ok in general, problem is when mdadm
> > decides to set sync_max to max. mdmon cannot set max due to fact that
> > this would block
> > > rest of array to user. This means that mdmon should move sync_max
> and
> > suspend_hi in parallel through the rest of array by some distances.
> > > This can gives us additional opportunities to store checkpoints. I
> > would like to know your opinion about such solution.
> >
> > suspend_hi should be manipulated by mdadm, not mdmon.
> >
> > Here is my outline that I sent earlier.  Please base your
> > implementation on
> > this, though feel free to comment if you find some part of it doesn't
> > work.
> >
> > This is from my email to you on 29 Nov 2010
> >  subject: Re: [PATCH 00/53] External Metadata Reshape
> >
> >
> > 1/ mdadm freezes the array so the no recovery or reshape can start.
> > 2/ mdadm sets sync_max to 0 so even when the array is unfrozen, no
> data
> > will
> >    be relocated.  It also sets suspend_lo and suspend_hi to zero.
> > 3/ mdadm tells the kernel about the requested reshape, setting some
> or
> > all of
> >    chunk_size, layout, level, raid_disks (and later, data_offset for
> > each
> >    device).
> > 4/ mdadm checks that mdmon has noticed the changes and has updates
> the
> >    metadata to show a reshape-in-progress (ping_monitor).
> > 5/ mdadm unfreezes the array for mdmon (change the '-' in
> > metadata_version
> >    back to '/') and calls ping_monitor
> > 6/ mdmon assigns spares as appropriate and tells the kernel which
> slot
> > to use
> >    for each.  This requires a kernel change.  The slot number will be
> > stored
> >    in saved_raid_disk.  ping_monitor doesn't complete until the
> spares
> > have
> >    been assigned.
> > 7/ mdadm asked the kernel to start reshape (echo reshape >
> > sync_action).
> >    This causes md_check_recovery to all remove_and_add_spares which
> > will
> >    add the chosen spares to the required slots and will create the
> > reshape
> >    thread.  That thread will not actually do anything yet as sync_max
> >    is still 0.
> >
> > 8/ Now we loop, performing backups, reshaping data, and updating the
> > metadata.
> >    It proceeds in a 'double-buffered' process where we are backing up
> > one
> >    section while the previous section is being reshaped.
> >
> >  8a/ mdadm sets suspend_hi to a larger number.  This blocks until
> > intervening
> >      IO is flushed.
> >  8b/ mdadm makes a backup copy of the data up to the new suspend_hi
> >  8c/ mdadm updates sync_max to match suspend_hi.
> >  8d/ kernel starts reshaping data and periodically signals progress
> > through
> >      sync_completed
> >  8e/ mdmon notices sync_completed changing and updates the metadata
> to
> >      record how far the reshape has progressed.
> >  8f/ mdadm notices sync_completed changing and when it passes the end
> > of the
> >      oldest of the two sections being worked on it uses ping_monitor
> to
> >      ensure the metadata is up-to-date and then moves suspend_lo to
> the
> >      beginning of the next section, and then goes back to 8a.
> >
> > 9/ When sync_completed reaches the end of the array, mdmon will
> notice
> > and
> >    update the metadata to show that the reshape has finished, and
> mdadm
> > will
> >    set both suspend_lo and suspend_hi to beyond the end of the array,
> > and all
> >    is done.
> 
> 
> Yes, I've got it, but for disk add case (OLCE) mdadm participates in
> process at begin only.
> After short time he direct mdmon to go with reshape to sync_max
> position as critical section is being passed.
> At this moment I think that mdmon should handle setting of sync_max. If
> mdmon will make what mdadm tells him, it should set
> suspend_hi to the end of array also (mdmon cannot monitor moving of
> suspend_hi). Proper setting suspend_hi is possible only together with
> sync_max.
> Summarizing problem for me is agreement that mdmon should handle moving
> sync_max entry when mdadm direct to set sync_max to max.
> I want to avoid setting large area between suspend_lo and suspend_hi
> (for a long/reshape time).
> 
> ... or we should decide that mdadm should participate in whole process
> (during working on critical area and later)?
> This is your intention?
> 
> >
> > >
> > > Second problem is about cleanup after reshape.
> > > >From uses space after reshape, I'm not able to set suspend_hi to
> 0.
> > This is up to suspend_hi_store() checks.(suspend_lo cannot be set to
> 0,
> > and suspend_hi cannot be less than suspend_lo).
> > > I think that part of Maciek's patch should be applied to md in
> > raid5.c, so at the end of raid5_finish_reshape() the following code
> > should be placed:
> > >
> > > if (mddev->external) {
> > > 	mddev->suspend_hi = 0;
> > > 	mddev->suspend_lo = 0;
> > > 	mddev->pers->quiesce(mddev, 1);
> > > 	mddev->pers->quiesce(mddev, 0);
> > > }
> > >
> > > The other option is accept for setting suspend_lo/hi to 0 when
> there
> > is no array processing (reshape), but first change I think is better.
> > > What is your opinion?
> >
> > Why do you want to set suspend_hi to zero after a reshape.
> > Just set both suspend_hi and suspend_lo to the size of the array
> (which
> > is
> > where the above process would get them to) and leave them there.
> >
> > NeilBrown
> 
> I'll try to set those values as you described.
> 
> I wanted to set suspend_lo/hi to 0 to get configuration of those
> entries back to state before reshape.
> I think that way, if I cannot manage those keys after reshape than how
> can I repeat reshape process (i.e. with other grow parameters).
> I will need to manage them before I start next operation. After reshape
> array (imho) should be ready for any next action. I think it is not
> ready now.
> I'm right?

OK, it works :), after setting those values to the end they can be moved to 0 again.


> 
> BR
> Adam
> 
> >
> > >
> > > BR
> > > Adam
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> raid"
> > in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux