Re: Re: [PATCH 4/4] raid1: Rewrite the implementation of iobarrier.

majianpeng <majianpeng@xxxxxxxxx> · Fri, 15 Nov 2013 10:29:56 +0800

>On Thu, 31 Oct 2013 11:20:55 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote:
>
>Sorry for the delayed reply.
>
>
>> >On Tue, 29 Oct 2013 09:30:14 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote:
>> >
>> >
>> >Nearly there!!  Just a few more details.  See below.
>> >
>> >
>> >> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
>> >> ---
>> >>  drivers/md/raid1.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++------
>> >>  drivers/md/raid1.h |  14 ++++++
>> >>  2 files changed, 124 insertions(+), 12 deletions(-)
>> >> 
>> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> >> index b4a6dcd..5b311c0 100644
>> >> --- a/drivers/md/raid1.c
>> >> +++ b/drivers/md/raid1.c
>> >> @@ -66,7 +66,8 @@
>> >>   */
>> >>  static int max_queued_requests = 1024;
>> >>  
>> >> -static void allow_barrier(struct r1conf *conf);
>> >> +static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
>> >> +				sector_t bi_sector);
>> >>  static void lower_barrier(struct r1conf *conf);
>> >>  
>> >>  static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
>> >> @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> >>  	struct bio *bio = r1_bio->master_bio;
>> >>  	int done;
>> >>  	struct r1conf *conf = r1_bio->mddev->private;
>> >> +	sector_t start_next_window = r1_bio->start_next_window;
>> >> +	sector_t bi_sector = bio->bi_sector;
>> >
>> >This should  be r1_bio->sector, not bio->bi_sector.
>> >They are often the same but if multiple r1_bios are needed for some reason
>> >(e.g. bad blocks) they may not be.
>> >
>> No, allow_barrier() only for bio not for r1bio.It only do when bio->bi_phys_segments == 0.
>
>Yes, I see that now, thanks.
>
>
>> If this value is r1_bio->sector, the value may has different value as you said.
>> So in allow_barrier():
>>        if (start_next_window) {
>>                 if (start_next_window == conf->start_next_window) {
>>                         if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
>>                                 <= bi_sector)
>>                                 conf->next_window_requests--;
>>                         else 
>>                                 conf->current_window_requests--;
>>                 } else 
>> It will has differnt result.
>> 
>> >>  
>> >>  	if (bio->bi_phys_segments) {
>> >>  		unsigned long flags;
>> >> @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> >>  		bio->bi_phys_segments--;
>> >>  		done = (bio->bi_phys_segments == 0);
>> >>  		spin_unlock_irqrestore(&conf->device_lock, flags);
>> >> +		/*
>> >> +		 * make_request() might be waiting for
>> >> +		 * bi_phys_segments to decrease
>> >> +		 */
>> >> +		wake_up(&conf->wait_barrier);
>> >>  	} else
>> >>  		done = 1;
>> >>  
>> >> @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
>> >>  		 * Wake up any possible resync thread that waits for the device
>> >>  		 * to go idle.
>> >>  		 */
>> >> -		allow_barrier(conf);
>> >> +		allow_barrier(conf, start_next_window, bi_sector);
>> >>  	}
>> >>  }
>> >>  
>> >> @@ -827,10 +835,18 @@ static void raise_barrier(struct r1conf *conf)
>> >>  	/* block any new IO from starting */
>> >>  	conf->barrier++;
>> >>  
>> >> -	/* Now wait for all pending IO to complete */
>> >> +	/* For those conditions we must wait:
>> >> +	 * A:while the array is in frozen state
>> >> +	 * B:while barrier >= RESYNC_DEPTH, meaning resync reach
>> >> +	 *   the max count which allowed.
>> >> +	 * C:next_resync + RESYNC_SECTORS > start_next_window, meaning
>> >> +	 *   next resync will reach to window which normal bios are handling.
>> >> +	 */
>> >>  	wait_event_lock_irq(conf->wait_barrier,
>> >>  			    !conf->array_frozen &&
>> >> -			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
>> >> +			    conf->barrier < RESYNC_DEPTH &&
>> >> +			    (conf->start_next_window >=
>> >> +			     conf->next_resync + RESYNC_SECTORS),
>> >>  			    conf->resync_lock);
>> >
>> >You've removed the test on conf->nr_pending here, which I think is correct.
>> >It counts 'read' requests as well.  Testing start_next_window serves the same
>> >purpose as it is increased whenever current_window_requests reaches zero.
>> >
>> In func wait_barrier():
>>         if (need_to_wait_for_sync(conf, bio)) {
>>               [snip]
>>         } else if (bio_data_dir(bio) == WRITE) {
>>                 if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>>                                 <= bio->bi_sector) {
>>                         if (conf->start_next_window == MaxSector)
>>                                 conf->start_next_window =
>>                                         conf->next_resync +
>>                                                 NEXT_NORMALIO_DISTANCE;
>> 
>>                         if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>>                                 <= bio->bi_sector)
>>                                 conf->next_window_requests++;
>>                         else
>>                                 conf->current_window_requests++;
>>                 }
>>                 if (bio->bi_sector >= conf->start_next_window)
>>                         sector = conf->start_next_window;
>>         }
>> start_next_window only for bio_data_dir(bio) == WRITE. So resync can't wait read io to complete.
>> 
>> >However you having modified as similar test on nr_pending in wait_barrier().
>I meant:       haven't modified a similar test ....
>> >That worries me a bit.  Should that be changed to a test on start_next_window
>> >to match the above change?
>> For this condition, i only copy it.For this condition, i'm not know clearly.
>> Can you give me more details about this?
>
>
>Before your patch we know if there are pending requests (submitted but not
>completed) which stop use from doing resync if conf->nr_pending > 0.
>After your patch we cannot use that that test as it counts READ requests, and
>requests that are outside the range being resynced.
>
>The test is now a bit more complicated.  I think it is:
>
>		 conf->start_next_window <
>                 conf->next_resync + RESYNC_SECTORS
>
>
>i.e. while start_next_window is less than next_resync+RESYNC_SECTORS there
>cold be outstanding writes that are blocking resync.  As soon as those writes
>complete, start_next_window will increase and the resync can complete.
>So if start_next_window < next_resync+RESYNC_SECTORS and there are pending
>request in current->bio_list, then then there is no point waiting for resync
>(it cannot progress anyway) so we may as well submit another write.
>
>So change:
>
>		wait_event_lock_irq(conf->wait_barrier,
>				    !conf->array_frozen &&
>				    (!conf->barrier ||
>				    (conf->nr_pending &&
>				     current->bio_list &&
>				     !bio_list_empty(current->bio_list))),
>				    conf->resync_lock);
>
>in wait barrier to
>
>		wait_event_lock_irq(conf->wait_barrier,
>				    !conf->array_frozen &&
>				    (!conf->barrier ||
>				    (conf->start_next_window < conf->next_resync + RESYNC_SECTORS &&
>				     current->bio_list &&
>				     !bio_list_empty(current->bio_list))),
>				    conf->resync_lock);
>
>
Ok, apply this.
>> >
>> >
>> >>  
>> >>  	spin_unlock_irq(&conf->resync_lock);
>> >> @@ -846,10 +862,33 @@ static void lower_barrier(struct r1conf *conf)
>> >>  	wake_up(&conf->wait_barrier);
>> >>  }
>> >>  
>> >> -static void wait_barrier(struct r1conf *conf)
>> >> +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
>> >> +{
>> >> +	bool wait = false;
>> >> +
>> >> +	if (conf->array_frozen || !bio)
>> >> +		wait = true;
>> >> +	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
>> >> +		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
>> >> +			wait = true;
>> >> +		else if ((conf->next_resync - RESYNC_WINDOW_SECTORS
>> >> +				>= bio_end_sector(bio)) ||
>> >> +			 (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> >> +				<= bio->bi_sector))
>> >> +			wait = false;
>> >> +		else
>> >> +			wait = true;
>> >> +	}
>> >> +
>> >> +	return wait;
>> >> +}
>> >> +
>> >> +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
>> >>  {
>> >> +	sector_t sector = 0;
>> >> +
>> >>  	spin_lock_irq(&conf->resync_lock);
>> >> -	if (conf->barrier) {
>> >> +	if (need_to_wait_for_sync(conf, bio)) {
>> >>  		conf->nr_waiting++;
>> >>  		/* Wait for the barrier to drop.
>> >>  		 * However if there are already pending
>> >> @@ -868,16 +907,57 @@ static void wait_barrier(struct r1conf *conf)
>> >>  				     !bio_list_empty(current->bio_list))),
>> >>  				    conf->resync_lock);
>> >>  		conf->nr_waiting--;
>> >> +	} else if (bio_data_dir(bio) == WRITE) {
>> >> +		if (conf->next_resync + NEXT_NORMALIO_DISTANCE
>> >> +				<= bio->bi_sector) {
>> >> +			if (conf->start_next_window == MaxSector)
>> >> +				conf->start_next_window =
>> >> +					conf->next_resync +
>> >> +						NEXT_NORMALIO_DISTANCE;
>> >> +
>> >> +			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
>> >> +				<= bio->bi_sector)
>> >> +				conf->next_window_requests++;
>> >> +			else
>> >> +				conf->current_window_requests++;
>> >> +		}
>> >> +		if (bio->bi_sector >= conf->start_next_window)
>> >> +			sector = conf->start_next_window;
>> >
>> >You aren't setting 'sector' if we needed to wait.  I don't think that is
>> >correct, is it?
>> >
>> Yes.How about those code:
>> spin_lock_irq(&conf->resync_lock);
>> retry_check:
>>         if (need_to_wait_for_sync(conf, bio)) {
>>                 conf->nr_waiting++;
>>                 /* Wait for the barrier to drop.
>>                  * However if there are already pending
>>                  * requests (preventing the barrier from
>>                  * rising completely), and the
>>                  * pre-process bio queue isn't empty,
>>                  * then don't wait, as we need to empty
>>                  * that queue to get the nr_pending
>>                  * count down.
>>                  */
>>                 wait_event_lock_irq(conf->wait_barrier,
>>                                     !conf->array_frozen &&
>>                                     (!conf->barrier ||
>>                                     (conf->nr_pending &&
>>                                      current->bio_list &&
>>                                      !bio_list_empty(current->bio_list))),
>>                                     conf->resync_lock);
>>                 conf->nr_waiting--;
>> 				goto retry_check;
>> >
>
>It would look a lot better if you made it a while loop:
>
> while (need_to_wait....) {
>     nr_waiting++
>     wait_event.....
>     nr_waiting--
> if (bio_data_dir.....
>
For this, i think it don;t need while() or recheck.
The code should:

if (nedd_to_wait...) {
	nr_waiting++
	wait_event.....
	nr_waiting--
}
if (bio && bio_data_dir(bio) == WRITE) {
	 do;
}

I think again need_to_wait...(),it must return false. So we don't recheck this.
Or am I missing something?

Thanks!
Jianpeng Ma

?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f