[PATCH 4/4] raid1: Rewrite the implementation of iobarrier.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There is a iobarrier in raid1 because the contend between normalIO and
resyncIO.It suspend all normal IO when reysync.
But if normalIO is outrange the resync windwos,there is no contend.
So I rewrite the iobarrier.

I patition the whole space into five parts.
|---------|-----------|------------|----------------|-------|
         Pa    next_resync   start_next_window      Pb

Firstly i introduce some concepts:
1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync
request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 *
RESYNC_BLOCK_SIZE, that is 2MB.
2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync
and start_next_window.It also indicate the distance between
start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune.
3:next_resync mean the next start sector which will being do sync.
4:Pa means a position which before RESYNC_WINDOW distance from
next_resync.
5:start_next_window means a position which after NEXT_NORMALIO_DISTANCE distance from
next_resync.For normalio after this position,it can't wait resyncio
6:Pb means a position which after 2 * NEXT_NORMALIO_DISTANCE distance from
next_resync.
7:current_window_requests means the count of normalIO between Pb and
start_next_window.
8:next_window_requests means the count of nonmalIO after Pb.

NormalIO will be partitioned into four types:
NoIO1: it means the end sector of bio is smaller or equal the Pa.
NoIO2: it means the start sector of bio larger or equal Pb
NoIO3: it means the start sector of bio larger or equal	start_next_window.
NoIO4: it means the location between Pa and start_next_window

For NoIO1,it don't use iobarrier.
For NoIO4,it used the original iobarrier mechanism.The nornalIO and
syncIO must be contend.
For NoIO2/3, i add two filed in struct r1conf "current_window_requests,
next_window_requests".They indicate the count of two types.
For those, they don't wait.They only add the relevant parameter.

For resync action, if there are NoIO4s, it must be wait.If not,it
forward.But if current_window_requests > 0 and sync action reached to the
start_next_window,it must wait until the current_window_requests became
zero.If current_window_requests became zero,the start_next_window also
forward. The current_window_requests will replaced by
next_window_requests.

There is a problem which when and how to change from NoIO2 to NoIO3.Only
so, the sync action can forward.

I add a filed in struct r1conf "start_next_window"..What condition will change?
A: if start_next_window == MaxSector, it means there are no NoIO2/3.
   So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
B: if current_window_requests == 0 && next_window_requests != 0, it
   means start_next_window move to Pb

There is anthor problem which how to differentiate when NoIO2 became
NoIO3.
For example, there are many bios which are NoIO2 and a bio which is
NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3.
At generil,we should use flags and list_head. The codes should:
>list_splice_init(NoIO2, NoIO3);
>current_window_requests = next_window_requests;
>next_window_requests = 0;
>list_for_each_entry(NoIO3){
>       clear_bit(NoIO2_FLAG),
>       set_bit(NoIO3_FLAG,
>}
If there are many NoIO2, it will take too long in list_for_each_entry.
Avoid this, i add a filed in struct r1bio "start_next_window".
I used this to record the position conf->start_next_window when call wait_barrier in func
make_request.
For NoIO1/4, the value is zero.
In func allow_barrier, it check the conf->start_next_window
If r1bio->stat_next_window == conf->start_next_window,it mean
there is no transition which NoIO2 to NoIO3.In this condtion
 if bio->bi_sector > conf->start_next_window + NEXT_NORMALIO_DISTANCE
    it means the bio is NoIO2;
 else
    it means the bio si NoIO3.

If r1bio->start_next_window != conf->start_next_window,it mean
there is a trnasiton which NoIO2 to NoIO3.Because there is at most one
transtions.So it only means the bio is NoIO2.

For one bio,there are many r1bio.So we make sure the
r1bio->start_next_window is the same value.
If we met blocked_dev, it must call allow_barrier and wait_barrier.
So the first and the second value of conf->start_next_window will
change.If there are many r1bio's with differnet start_next_window.
For the relevant bio, it depent on the last value of r1bio.
It will cause error.To avoid this, we must wait previous r1bios completes.

Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
---
 drivers/md/raid1.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++------
 drivers/md/raid1.h |  14 +++++++
 2 files changed, 121 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 08f076a..4499870 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -66,7 +66,8 @@
  */
 static int max_queued_requests = 1024;
 
-static void allow_barrier(struct r1conf *conf);
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector);
 static void lower_barrier(struct r1conf *conf);
 
 static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
@@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
 	struct bio *bio = r1_bio->master_bio;
 	int done;
 	struct r1conf *conf = r1_bio->mddev->private;
+	sector_t start_next_window = r1_bio->start_next_window;
+	sector_t bi_sector = bio->bi_sector;
 
 	if (bio->bi_phys_segments) {
 		unsigned long flags;
@@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		bio->bi_phys_segments--;
 		done = (bio->bi_phys_segments == 0);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
+		/*
+		 * make_request might be waiting for
+		 * bi_phys_sectors to decrease
+		 */
+		wake_up(&conf->wait_barrier);
 	} else
 		done = 1;
 
@@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
 		 * Wake up any possible resync thread that waits for the device
 		 * to go idle.
 		 */
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bi_sector);
 	}
 }
 
@@ -827,10 +835,19 @@ static void raise_barrier(struct r1conf *conf)
 	/* block any new IO from starting */
 	conf->barrier++;
 
-	/* Now wait for all pending IO to complete */
+	/* For those conditions, we must wait
+	* A:array is in frozen state.
+	* B:conf->nr_pending > 0 mean in sync window there are normal bios.
+	* C:total barrier are more than RESYNC_DEPTH
+	* D:conf->next_resync + RESYNC_SECTORS > conf->start_next_window
+	*   mean the next nornalIO window not clear and next syncIO will be into
+	*   this window.
+	*/
 	wait_event_lock_irq(conf->wait_barrier,
 			    !conf->array_frozen && 
-			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+			    !conf->nr_pending && conf->barrier < RESYNC_DEPTH &&
+			    (conf->next_resync + RESYNC_SECTORS
+				    <= conf->start_next_window),
 			    conf->resync_lock);
 
 	spin_unlock_irq(&conf->resync_lock);
@@ -846,10 +863,43 @@ static void lower_barrier(struct r1conf *conf)
 	wake_up(&conf->wait_barrier);
 }
 
-static void wait_barrier(struct r1conf *conf)
+static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
 {
+	bool wait = false;
+
+	if (conf->array_frozen || !bio)
+		wait = true;
+	else if (conf->barrier && bio_data_dir(bio) == WRITE) {
+		if (conf->next_resync < RESYNC_WINDOW_SECTORS)
+			wait = true;
+		else if (conf->next_resync - RESYNC_WINDOW_SECTORS >=
+				bio_end_sector(bio))
+			wait = false;
+		else if (conf->next_resync + NEXT_NORMALIO_DISTANCE
+				<= bio->bi_sector) {
+			if (conf->start_next_window == MaxSector)
+				conf->start_next_window =
+					conf->next_resync +
+						NEXT_NORMALIO_DISTANCE;
+
+			if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE)
+				<= bio->bi_sector)
+				conf->next_window_requests++;
+			else
+				conf->current_window_requests++;
+		} else
+			wait = true;
+	}
+
+	return wait;
+}
+
+static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
+{
+	sector_t sector = 0;
+
 	spin_lock_irq(&conf->resync_lock);
-	if (conf->barrier) {
+	if (need_to_wait_for_sync(conf, bio)) {
 		conf->nr_waiting++;
 		/* Wait for the barrier to drop.
 		 * However if there are already pending
@@ -868,16 +918,42 @@ static void wait_barrier(struct r1conf *conf)
 				     !bio_list_empty(current->bio_list))),
 				    conf->resync_lock);
 		conf->nr_waiting--;
-	}
+	} else if (bio_data_dir(bio) == WRITE &&
+			bio->bi_sector >= conf->start_next_window)
+		sector = conf->start_next_window;
 	conf->nr_pending++;
 	spin_unlock_irq(&conf->resync_lock);
+	return sector;
 }
 
-static void allow_barrier(struct r1conf *conf)
+static void allow_barrier(struct r1conf *conf, sector_t start_next_window,
+				sector_t bi_sector)
 {
 	unsigned long flags;
+
 	spin_lock_irqsave(&conf->resync_lock, flags);
 	conf->nr_pending--;
+	if (start_next_window) {
+		if (start_next_window == conf->start_next_window) {
+			if (conf->start_next_window + NEXT_NORMALIO_DISTANCE
+				<= bi_sector)
+				conf->next_window_requests--;
+			else
+				conf->current_window_requests--;
+		} else
+			conf->current_window_requests--;
+
+		if (!conf->current_window_requests) {
+			if (conf->next_window_requests) {
+				conf->current_window_requests =
+					conf->next_window_requests;
+				conf->next_window_requests = 0;
+				conf->start_next_window +=
+					NEXT_NORMALIO_DISTANCE;
+			} else
+				conf->start_next_window = MaxSector;
+		}
+	}
 	spin_unlock_irqrestore(&conf->resync_lock, flags);
 	wake_up(&conf->wait_barrier);
 }
@@ -1012,6 +1088,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	int first_clone;
 	int sectors_handled;
 	int max_sectors;
+	sector_t start_next_window;
 
 	/*
 	 * Register the new request and wait if the reconstruction
@@ -1041,7 +1118,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 		finish_wait(&conf->wait_barrier, &w);
 	}
 
-	wait_barrier(conf);
+	start_next_window = wait_barrier(conf, bio);
 
 	bitmap = mddev->bitmap;
 
@@ -1162,6 +1239,7 @@ read_again:
 
 	disks = conf->raid_disks * 2;
  retry_write:
+	r1_bio->start_next_window = start_next_window;
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	max_sectors = r1_bio->sectors;
@@ -1230,14 +1308,24 @@ read_again:
 	if (unlikely(blocked_rdev)) {
 		/* Wait for this device to become unblocked */
 		int j;
+		sector_t old = start_next_window;
 
 		for (j = 0; j < i; j++)
 			if (r1_bio->bios[j])
 				rdev_dec_pending(conf->mirrors[j].rdev, mddev);
 		r1_bio->state = 0;
-		allow_barrier(conf);
+		allow_barrier(conf, start_next_window, bio->bi_sector);
 		md_wait_for_blocked_rdev(blocked_rdev, mddev);
-		wait_barrier(conf);
+		start_next_window = wait_barrier(conf, bio);
+		/*
+		 * We must make sure the multi r1bios of bio have
+		 * the same value of after_threetimes_resync_sector
+		 */
+		if (bio->bi_phys_segments && old &&
+			old != start_next_window)
+			/*wait the former r1bio(s) completed*/
+			wait_event(conf->wait_barrier,
+					bio->bi_phys_segments == 1);
 		goto retry_write;
 	}
 
@@ -1437,11 +1525,14 @@ static void print_conf(struct r1conf *conf)
 
 static void close_sync(struct r1conf *conf)
 {
-	wait_barrier(conf);
-	allow_barrier(conf);
+	wait_barrier(conf, NULL);
+	allow_barrier(conf, 0, 0);
 
 	mempool_destroy(conf->r1buf_pool);
 	conf->r1buf_pool = NULL;
+
+	conf->next_resync = 0;
+	conf->start_next_window = MaxSector;
 }
 
 static int raid1_spare_active(struct mddev *mddev)
@@ -2712,6 +2803,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 	conf->pending_count = 0;
 	conf->recovery_disabled = mddev->recovery_disabled - 1;
 
+	conf->start_next_window = MaxSector;
+	conf->current_window_requests = conf->next_window_requests = 0;
+
 	err = -EIO;
 	for (i = 0; i < conf->raid_disks * 2; i++) {
 
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index 331a98a..07425a1 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -41,6 +41,19 @@ struct r1conf {
 	 */
 	sector_t		next_resync;
 
+	/*when raid1 start resync,we divide raid into four partitions
+	 * |---------|--------------|---------------------|-------------|
+	 *        next_resync   start_next_window        Pc
+	 * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE
+	 * Pc = start_next_window + NEXT_NORMALIO_DISTANCE
+	 * current_window_requests means the count of normalIO between
+	 * start_next_window and Pc.
+	 * next_window_requests means the count of nornalIO after Pc.
+	 * */
+	sector_t		start_next_window;
+	int			current_window_requests;
+	int			next_window_requests;
+
 	spinlock_t		device_lock;
 
 	/* list of 'struct r1bio' that need to be processed by raid1d,
@@ -112,6 +125,7 @@ struct r1bio {
 						 * in this BehindIO request
 						 */
 	sector_t		sector;
+	sector_t		start_next_window;
 	int			sectors;
 	unsigned long		state;
 	struct mddev		*mddev;
-- 
1.8.1.2
ÿôèº{.nÇ+?·?®?­?+%?Ëÿ±éݶ¥?wÿº{.nÇ+?·¥?{±þ¶¢wø§¶?¡Ü¨}©?²Æ zÚ&j:+v?¨þø¯ù®w¥þ?à2?Þ?¨è­Ú&¢)ß¡«a¶Úÿÿûàz¿äz¹Þ?ú+?ù???Ý¢jÿ?wèþf





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux