Re: Re: [PATCH 3/5] md: Increase the frequency of recording checkpoint of recovery.

majianpeng <majianpeng@xxxxxxxxx> · Mon, 29 Oct 2012 17:24:11 +0800

>On Sat, 27 Oct 2012 10:28:20 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote:
>
>> With the hard disk capacity of larger,the entire recovery process
>> becomes longer.So it should increase the frequency.
>> Supposed the speed of recovery is 100MB/s(i think for HDD is good).
>> The total time of recovery 4TB hdd is about 666 minutes.So change the
>> frequency from 16 to 64 times,it's about 10mins.
>> 
>> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
>> ---
>>  drivers/md/md.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 01e25bd..ba328ee 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -7446,7 +7446,7 @@ void md_do_sync(struct md_thread *thread)
>>  		if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
>>  		    ((mddev->curr_resync > mddev->curr_resync_completed &&
>>  		      (mddev->curr_resync - mddev->curr_resync_completed)
>> -		      > (max_sectors >> 4)) ||
>> +		      > (max_sectors >> 6)) ||
>>  		     (j - mddev->curr_resync_completed)*2
>>  		     >= mddev->resync_max - mddev->curr_resync_completed
>>  			    )) {
>
>I don't like this - the number '6' is completely arbitrary.
>
>The original '4' was never about number of minutes.  It was a fraction of the
>total effort needed.
>I wanted to make sure you only repeated at most 10% of the required effort
>(one order of magnitude).  >>4 is about 6% and easy to calculate.
>
>If you want it to be about time (which I don't object to), then make it about
>time.  Keep track of the last time we updated curr_resync_completed, and if
>that is more than a few minutes ago, update it again.
>This is what I said last time - if you want something to be based on time,
>test the time.
>If 'echo idle > sync_action' doesn't work (Which it seems it doesn't), then
>it is OK to just do it in the kernel.
>
>NeilBrown
>
Ok, I used your suggestions.

[PATCH V1] md:Updating checkpoint of resync/recovery based time.

With the hard disk capacity of larger,the entire recovery process
becomes longer.So it should increase the frequency.
To do so there are a lot of benefits,for example
1:avoid doing repeated work when os crashed or suddenly power failure.
2:In md drivers, there are many places to be treated differently based
on recovery_cp.For example, commit a7854487cd7128a30a7f4.

It is difficult to judge how long it updated.There i chosed five
minutes.No impact on performance.

Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
---
 drivers/md/md.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9ab768a..eddb840 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7269,6 +7269,7 @@ EXPORT_SYMBOL_GPL(md_allow_write);
 
 #define SYNC_MARKS	10
 #define	SYNC_MARK_STEP	(3*HZ)
+#define UPDATE_FREQUENCY (5*60*HZ)
 void md_do_sync(struct md_thread *thread)
 {
 	struct mddev *mddev = thread->mddev;
@@ -7277,6 +7278,7 @@ void md_do_sync(struct md_thread *thread)
 		 window;
 	sector_t max_sectors,j, io_sectors;
 	unsigned long mark[SYNC_MARKS];
+	unsigned long update_time;
 	sector_t mark_cnt[SYNC_MARKS];
 	int last_mark,m;
 	struct list_head *tmp;
@@ -7436,17 +7438,19 @@ void md_do_sync(struct md_thread *thread)
 	mddev->curr_resync_completed = j;
 	sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 	md_new_event(mddev);
+	update_time = jiffies;
 
 	blk_start_plug(&plug);
 	while (j < max_sectors) {
 		sector_t sectors;
-
+
 		skipped = 0;
 
 		if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 		    ((mddev->curr_resync > mddev->curr_resync_completed &&
 		      (mddev->curr_resync - mddev->curr_resync_completed)
 		      > (max_sectors >> 4)) ||
+		     time_after_eq(jiffies, update_time + UPDATE_FREQUENCY) ||
 		     (j - mddev->curr_resync_completed)*2
 		     >= mddev->resync_max - mddev->curr_resync_completed
 			    )) {
@@ -7454,6 +7458,7 @@ void md_do_sync(struct md_thread *thread)
 			wait_event(mddev->recovery_wait,
 				   atomic_read(&mddev->recovery_active) == 0);
 			mddev->curr_resync_completed = j;
+			update_time = jiffies;
 			set_bit(MD_CHANGE_CLEAN, &mddev->flags);
 			sysfs_notify(&mddev->kobj, NULL, "sync_completed");
 		}
-- 
1.7.9.5
?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f