Re: [PATCH 3/5] md: Increase the frequency of recording checkpoint of recovery.

NeilBrown <neilb@xxxxxxx> · Wed, 31 Oct 2012 11:37:24 +1100

On Mon, 29 Oct 2012 17:24:11 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote:

> >On Sat, 27 Oct 2012 10:28:20 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote:
> >
> >> With the hard disk capacity of larger,the entire recovery process
> >> becomes longer.So it should increase the frequency.
> >> Supposed the speed of recovery is 100MB/s(i think for HDD is good).
> >> The total time of recovery 4TB hdd is about 666 minutes.So change the
> >> frequency from 16 to 64 times,it's about 10mins.
> >> 
> >> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
> >> ---
> >>  drivers/md/md.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 01e25bd..ba328ee 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -7446,7 +7446,7 @@ void md_do_sync(struct md_thread *thread)
> >>  		if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
> >>  		    ((mddev->curr_resync > mddev->curr_resync_completed &&
> >>  		      (mddev->curr_resync - mddev->curr_resync_completed)
> >> -		      > (max_sectors >> 4)) ||
> >> +		      > (max_sectors >> 6)) ||
> >>  		     (j - mddev->curr_resync_completed)*2
> >>  		     >= mddev->resync_max - mddev->curr_resync_completed
> >>  			    )) {
> >
> >I don't like this - the number '6' is completely arbitrary.
> >
> >The original '4' was never about number of minutes.  It was a fraction of the
> >total effort needed.
> >I wanted to make sure you only repeated at most 10% of the required effort
> >(one order of magnitude).  >>4 is about 6% and easy to calculate.
> >
> >If you want it to be about time (which I don't object to), then make it about
> >time.  Keep track of the last time we updated curr_resync_completed, and if
> >that is more than a few minutes ago, update it again.
> >This is what I said last time - if you want something to be based on time,
> >test the time.
> >If 'echo idle > sync_action' doesn't work (Which it seems it doesn't), then
> >it is OK to just do it in the kernel.
> >
> >NeilBrown
> >
> Ok, I used your suggestions.
> 
> [PATCH V1] md:Updating checkpoint of resync/recovery based time.
> 
> With the hard disk capacity of larger,the entire recovery process
> becomes longer.So it should increase the frequency.
> To do so there are a lot of benefits,for example
> 1:avoid doing repeated work when os crashed or suddenly power failure.
> 2:In md drivers, there are many places to be treated differently based
> on recovery_cp.For example, commit a7854487cd7128a30a7f4.
> 
> It is difficult to judge how long it updated.There i chosed five
> minutes.No impact on performance.
> 
> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
> ---
>  drivers/md/md.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 9ab768a..eddb840 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7269,6 +7269,7 @@ EXPORT_SYMBOL_GPL(md_allow_write);
>  
>  #define SYNC_MARKS	10
>  #define	SYNC_MARK_STEP	(3*HZ)
> +#define UPDATE_FREQUENCY (5*60*HZ)
>  void md_do_sync(struct md_thread *thread)
>  {
>  	struct mddev *mddev = thread->mddev;
> @@ -7277,6 +7278,7 @@ void md_do_sync(struct md_thread *thread)
>  		 window;
>  	sector_t max_sectors,j, io_sectors;
>  	unsigned long mark[SYNC_MARKS];
> +	unsigned long update_time;
>  	sector_t mark_cnt[SYNC_MARKS];
>  	int last_mark,m;
>  	struct list_head *tmp;
> @@ -7436,17 +7438,19 @@ void md_do_sync(struct md_thread *thread)
>  	mddev->curr_resync_completed = j;
>  	sysfs_notify(&mddev->kobj, NULL, "sync_completed");
>  	md_new_event(mddev);
> +	update_time = jiffies;
>  
>  	blk_start_plug(&plug);
>  	while (j < max_sectors) {
>  		sector_t sectors;
> -
> +
>  		skipped = 0;
>  
>  		if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>  		    ((mddev->curr_resync > mddev->curr_resync_completed &&
>  		      (mddev->curr_resync - mddev->curr_resync_completed)
>  		      > (max_sectors >> 4)) ||
> +		     time_after_eq(jiffies, update_time + UPDATE_FREQUENCY) ||
>  		     (j - mddev->curr_resync_completed)*2
>  		     >= mddev->resync_max - mddev->curr_resync_completed
>  			    )) {
> @@ -7454,6 +7458,7 @@ void md_do_sync(struct md_thread *thread)
>  			wait_event(mddev->recovery_wait,
>  				   atomic_read(&mddev->recovery_active) == 0);
>  			mddev->curr_resync_completed = j;
> +			update_time = jiffies;
>  			set_bit(MD_CHANGE_CLEAN, &mddev->flags);
>  			sysfs_notify(&mddev->kobj, NULL, "sync_completed");
>  		}

Applied, thanks.

NeilBrown

Attachment:
signature.asc

Description: PGP signature