> -----Original Message----- > From: Neil Brown [mailto:neilb@xxxxxxx] > Sent: Thursday, December 09, 2010 11:28 AM > To: Kwolek, Adam > Cc: linux-raid@xxxxxxxxxxxxxxx; Williams, Dan J; Ciechanowski, Ed > Subject: Re: Suspend_hi mamagment during reshape > > On Thu, 9 Dec 2010 08:42:35 +0000 "Kwolek, Adam" > <adam.kwolek@xxxxxxxxx> > wrote: > > > Hi, > > > > I've got a problem with suspend_hi management during check-pointing, > as we discuss this a while ago. > > > > Currently, I've corrected check-pointing in the way that mdmon sets > suspend_hi to the place that sync_max is set in current pass to guard > access. > > This assumption looks for me ok in general, problem is when mdadm > decides to set sync_max to max. mdmon cannot set max due to fact that > this would block > > rest of array to user. This means that mdmon should move sync_max and > suspend_hi in parallel through the rest of array by some distances. > > This can gives us additional opportunities to store checkpoints. I > would like to know your opinion about such solution. > > suspend_hi should be manipulated by mdadm, not mdmon. > > Here is my outline that I sent earlier. Please base your > implementation on > this, though feel free to comment if you find some part of it doesn't > work. > > This is from my email to you on 29 Nov 2010 > subject: Re: [PATCH 00/53] External Metadata Reshape > > > 1/ mdadm freezes the array so the no recovery or reshape can start. > 2/ mdadm sets sync_max to 0 so even when the array is unfrozen, no data > will > be relocated. It also sets suspend_lo and suspend_hi to zero. > 3/ mdadm tells the kernel about the requested reshape, setting some or > all of > chunk_size, layout, level, raid_disks (and later, data_offset for > each > device). > 4/ mdadm checks that mdmon has noticed the changes and has updates the > metadata to show a reshape-in-progress (ping_monitor). > 5/ mdadm unfreezes the array for mdmon (change the '-' in > metadata_version > back to '/') and calls ping_monitor > 6/ mdmon assigns spares as appropriate and tells the kernel which slot > to use > for each. This requires a kernel change. The slot number will be > stored > in saved_raid_disk. ping_monitor doesn't complete until the spares > have > been assigned. > 7/ mdadm asked the kernel to start reshape (echo reshape > > sync_action). > This causes md_check_recovery to all remove_and_add_spares which > will > add the chosen spares to the required slots and will create the > reshape > thread. That thread will not actually do anything yet as sync_max > is still 0. > > 8/ Now we loop, performing backups, reshaping data, and updating the > metadata. > It proceeds in a 'double-buffered' process where we are backing up > one > section while the previous section is being reshaped. > > 8a/ mdadm sets suspend_hi to a larger number. This blocks until > intervening > IO is flushed. > 8b/ mdadm makes a backup copy of the data up to the new suspend_hi > 8c/ mdadm updates sync_max to match suspend_hi. > 8d/ kernel starts reshaping data and periodically signals progress > through > sync_completed > 8e/ mdmon notices sync_completed changing and updates the metadata to > record how far the reshape has progressed. > 8f/ mdadm notices sync_completed changing and when it passes the end > of the > oldest of the two sections being worked on it uses ping_monitor to > ensure the metadata is up-to-date and then moves suspend_lo to the > beginning of the next section, and then goes back to 8a. > > 9/ When sync_completed reaches the end of the array, mdmon will notice > and > update the metadata to show that the reshape has finished, and mdadm > will > set both suspend_lo and suspend_hi to beyond the end of the array, > and all > is done. Yes, I've got it, but for disk add case (OLCE) mdadm participates in process at begin only. After short time he direct mdmon to go with reshape to sync_max position as critical section is being passed. At this moment I think that mdmon should handle setting of sync_max. If mdmon will make what mdadm tells him, it should set suspend_hi to the end of array also (mdmon cannot monitor moving of suspend_hi). Proper setting suspend_hi is possible only together with sync_max. Summarizing problem for me is agreement that mdmon should handle moving sync_max entry when mdadm direct to set sync_max to max. I want to avoid setting large area between suspend_lo and suspend_hi (for a long/reshape time). ... or we should decide that mdadm should participate in whole process (during working on critical area and later)? This is your intention? > > > > > Second problem is about cleanup after reshape. > > >From uses space after reshape, I'm not able to set suspend_hi to 0. > This is up to suspend_hi_store() checks.(suspend_lo cannot be set to 0, > and suspend_hi cannot be less than suspend_lo). > > I think that part of Maciek's patch should be applied to md in > raid5.c, so at the end of raid5_finish_reshape() the following code > should be placed: > > > > if (mddev->external) { > > mddev->suspend_hi = 0; > > mddev->suspend_lo = 0; > > mddev->pers->quiesce(mddev, 1); > > mddev->pers->quiesce(mddev, 0); > > } > > > > The other option is accept for setting suspend_lo/hi to 0 when there > is no array processing (reshape), but first change I think is better. > > What is your opinion? > > Why do you want to set suspend_hi to zero after a reshape. > Just set both suspend_hi and suspend_lo to the size of the array (which > is > where the above process would get them to) and leave them there. > > NeilBrown I'll try to set those values as you described. I wanted to set suspend_lo/hi to 0 to get configuration of those entries back to state before reshape. I think that way, if I cannot manage those keys after reshape than how can I repeat reshape process (i.e. with other grow parameters). I will need to manage them before I start next operation. After reshape array (imho) should be ready for any next action. I think it is not ready now. I'm right? BR Adam > > > > > BR > > Adam > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" > in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html