Hi, sorry for the excessive delay in reviewing these. I've applied the first three, though I fixed some careless spelling errors in the patch descriptions and fixed up some of the grammar. This one is fairly good - just a couple of fixes needed. It is good that you have a nice details patch description at the top, but it is a bit hard to read. Fixing the spelling and improving the formatting would help a lot.... See below for comments in the code.... On Wed, 28 Aug 2013 19:40:54 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote: > There is a iobarrier in raid1 because the contend between normalIO and > resyncIO.It suspend all normal IO when reysync. > But if normalIO is outrange the resync windwos,there is no contend. > So I rewrite the iobarrier. > > I patition the whole space into five parts. > |---------|-----------|------------|----------------|-------| > Pa next_resync start_next_window Pb > > Firstly i introduce some concepts: > 1:RESYNC_WINDOW: For resync, there are 32 sync requests at most.A sync > request is RESYNC_BLOCK_SIZE(64*1024).So the RESYNC_WINDOW is 32 * > RESYNC_BLOCK_SIZE, that is 2MB. > 2:NEXT_NORMALIO_DISTANCE: it indicate the distance between next_resync > and start_next_window.It also indicate the distance between > start_next_window and Pb.Now it is 3 * RESYNC_WINDOW_SIZE.It can tune. > 3:next_resync mean the next start sector which will being do sync. > 4:Pa means a position which before RESYNC_WINDOW distance from > next_resync. > 5:start_next_window means a position which after NEXT_NORMALIO_DISTANCE distance from > next_resync.For normalio after this position,it can't wait resyncio > 6:Pb means a position which after 2 * NEXT_NORMALIO_DISTANCE distance from > next_resync. > 7:current_window_requests means the count of normalIO between Pb and > start_next_window. > 8:next_window_requests means the count of nonmalIO after Pb. > > NormalIO will be partitioned into four types: > NoIO1: it means the end sector of bio is smaller or equal the Pa. > NoIO2: it means the start sector of bio larger or equal Pb > NoIO3: it means the start sector of bio larger or equal start_next_window. > NoIO4: it means the location between Pa and start_next_window > > For NoIO1,it don't use iobarrier. > For NoIO4,it used the original iobarrier mechanism.The nornalIO and > syncIO must be contend. > For NoIO2/3, i add two filed in struct r1conf "current_window_requests, > next_window_requests".They indicate the count of two types. > For those, they don't wait.They only add the relevant parameter. > > For resync action, if there are NoIO4s, it must be wait.If not,it > forward.But if current_window_requests > 0 and sync action reached to the > start_next_window,it must wait until the current_window_requests became > zero.If current_window_requests became zero,the start_next_window also > forward. The current_window_requests will replaced by > next_window_requests. > > There is a problem which when and how to change from NoIO2 to NoIO3.Only > so, the sync action can forward. > > I add a filed in struct r1conf "start_next_window"..What condition will change? > A: if start_next_window == MaxSector, it means there are no NoIO2/3. > So start_next_window = next_resync + NEXT_NORMALIO_DISTANCE > B: if current_window_requests == 0 && next_window_requests != 0, it > means start_next_window move to Pb > > There is anthor problem which how to differentiate when NoIO2 became > NoIO3. > For example, there are many bios which are NoIO2 and a bio which is > NoIO3. NoIO3 firstly completed,so the bios of NoIO2 became NoIO3. > At generil,we should use flags and list_head. The codes should: > >list_splice_init(NoIO2, NoIO3); > >current_window_requests = next_window_requests; > >next_window_requests = 0; > >list_for_each_entry(NoIO3){ > > clear_bit(NoIO2_FLAG), > > set_bit(NoIO3_FLAG, > >} > If there are many NoIO2, it will take too long in list_for_each_entry. > Avoid this, i add a filed in struct r1bio "start_next_window". > I used this to record the position conf->start_next_window when call wait_barrier in func > make_request. > For NoIO1/4, the value is zero. > In func allow_barrier, it check the conf->start_next_window > If r1bio->stat_next_window == conf->start_next_window,it mean > there is no transition which NoIO2 to NoIO3.In this condtion > if bio->bi_sector > conf->start_next_window + NEXT_NORMALIO_DISTANCE > it means the bio is NoIO2; > else > it means the bio si NoIO3. > > If r1bio->start_next_window != conf->start_next_window,it mean > there is a trnasiton which NoIO2 to NoIO3.Because there is at most one > transtions.So it only means the bio is NoIO2. > > For one bio,there are many r1bio.So we make sure the > r1bio->start_next_window is the same value. > If we met blocked_dev, it must call allow_barrier and wait_barrier. > So the first and the second value of conf->start_next_window will > change.If there are many r1bio's with differnet start_next_window. > For the relevant bio, it depent on the last value of r1bio. > It will cause error.To avoid this, we must wait previous r1bios completes. > > Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx> > --- > drivers/md/raid1.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++------ > drivers/md/raid1.h | 14 +++++++ > 2 files changed, 121 insertions(+), 13 deletions(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 08f076a..4499870 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -66,7 +66,8 @@ > */ > static int max_queued_requests = 1024; > > -static void allow_barrier(struct r1conf *conf); > +static void allow_barrier(struct r1conf *conf, sector_t start_next_window, > + sector_t bi_sector); > static void lower_barrier(struct r1conf *conf); > > static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data) > @@ -227,6 +228,8 @@ static void call_bio_endio(struct r1bio *r1_bio) > struct bio *bio = r1_bio->master_bio; > int done; > struct r1conf *conf = r1_bio->mddev->private; > + sector_t start_next_window = r1_bio->start_next_window; > + sector_t bi_sector = bio->bi_sector; > > if (bio->bi_phys_segments) { > unsigned long flags; > @@ -234,6 +237,11 @@ static void call_bio_endio(struct r1bio *r1_bio) > bio->bi_phys_segments--; > done = (bio->bi_phys_segments == 0); > spin_unlock_irqrestore(&conf->device_lock, flags); > + /* > + * make_request might be waiting for > + * bi_phys_sectors to decrease > + */ > + wake_up(&conf->wait_barrier); bi_phys_sectors??? I think you mean bi_phys_segments. This sort of attention to detail is really important. > } else > done = 1; > > @@ -245,7 +253,7 @@ static void call_bio_endio(struct r1bio *r1_bio) > * Wake up any possible resync thread that waits for the device > * to go idle. > */ > - allow_barrier(conf); > + allow_barrier(conf, start_next_window, bi_sector); > } > } > > @@ -827,10 +835,19 @@ static void raise_barrier(struct r1conf *conf) > /* block any new IO from starting */ > conf->barrier++; > > - /* Now wait for all pending IO to complete */ > + /* For those conditions, we must wait > + * A:array is in frozen state. > + * B:conf->nr_pending > 0 mean in sync window there are normal bios. > + * C:total barrier are more than RESYNC_DEPTH > + * D:conf->next_resync + RESYNC_SECTORS > conf->start_next_window > + * mean the next nornalIO window not clear and next syncIO will be into > + * this window. > + */ This comment is indented badly. /* For these conditions we must wait: * A: while the array is in the frozen state * B: while nr_pending > 0, meaning that there are normal bios in the * resync window ... > wait_event_lock_irq(conf->wait_barrier, > !conf->array_frozen && > - !conf->nr_pending && conf->barrier < RESYNC_DEPTH, > + !conf->nr_pending && conf->barrier < RESYNC_DEPTH && > + (conf->next_resync + RESYNC_SECTORS > + <= conf->start_next_window), I think conditions are much more readable if the thing that you expect to change comes first. In this case we might need to wait until start_next_window increases, so conf->start_next_window >= conf->next_resync + RESYNC_SECTORS as it is, it looks like you are waiting for next_resync to decrease. > conf->resync_lock); > > spin_unlock_irq(&conf->resync_lock); > @@ -846,10 +863,43 @@ static void lower_barrier(struct r1conf *conf) > wake_up(&conf->wait_barrier); > } > > -static void wait_barrier(struct r1conf *conf) > +static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio) > { > + bool wait = false; > + > + if (conf->array_frozen || !bio) > + wait = true; > + else if (conf->barrier && bio_data_dir(bio) == WRITE) { > + if (conf->next_resync < RESYNC_WINDOW_SECTORS) > + wait = true; > + else if (conf->next_resync - RESYNC_WINDOW_SECTORS >= > + bio_end_sector(bio)) > + wait = false; > + else if (conf->next_resync + NEXT_NORMALIO_DISTANCE > + <= bio->bi_sector) { > + if (conf->start_next_window == MaxSector) > + conf->start_next_window = > + conf->next_resync + > + NEXT_NORMALIO_DISTANCE; > + > + if ((conf->start_next_window + NEXT_NORMALIO_DISTANCE) > + <= bio->bi_sector) > + conf->next_window_requests++; > + else > + conf->current_window_requests++; > + } else > + wait = true; > + } > + > + return wait; > +} I really don't like it that this function changes start_next_window and *_window_requests(). It looks like it is a simple function that tests a condition. But it actually has a side-effect as well. And it has the side-effect when it returns false which is extra confusing. Also, you are incrementing *_window_requests for READ requests, which you don't need to do. I would change it so the below looks something like: if (need_to_wait_for_sync(conf, bio)) { .... } else if (bio_data_dir(bio) == WRITE) { possibly update start_next_window and *_window_requests here } > + > +static sector_t wait_barrier(struct r1conf *conf, struct bio *bio) > +{ > + sector_t sector = 0; > + > spin_lock_irq(&conf->resync_lock); > - if (conf->barrier) { > + if (need_to_wait_for_sync(conf, bio)) { > conf->nr_waiting++; > /* Wait for the barrier to drop. > * However if there are already pending > @@ -868,16 +918,42 @@ static void wait_barrier(struct r1conf *conf) > !bio_list_empty(current->bio_list))), > conf->resync_lock); > conf->nr_waiting--; > - } > + } else if (bio_data_dir(bio) == WRITE && > + bio->bi_sector >= conf->start_next_window) > + sector = conf->start_next_window; > conf->nr_pending++; > spin_unlock_irq(&conf->resync_lock); > + return sector; > } > > -static void allow_barrier(struct r1conf *conf) > +static void allow_barrier(struct r1conf *conf, sector_t start_next_window, > + sector_t bi_sector) > { > unsigned long flags; > + > spin_lock_irqsave(&conf->resync_lock, flags); > conf->nr_pending--; > + if (start_next_window) { > + if (start_next_window == conf->start_next_window) { > + if (conf->start_next_window + NEXT_NORMALIO_DISTANCE > + <= bi_sector) > + conf->next_window_requests--; > + else > + conf->current_window_requests--; > + } else > + conf->current_window_requests--; > + > + if (!conf->current_window_requests) { > + if (conf->next_window_requests) { > + conf->current_window_requests = > + conf->next_window_requests; > + conf->next_window_requests = 0; > + conf->start_next_window += > + NEXT_NORMALIO_DISTANCE; > + } else > + conf->start_next_window = MaxSector; > + } > + } > spin_unlock_irqrestore(&conf->resync_lock, flags); > wake_up(&conf->wait_barrier); > } > @@ -1012,6 +1088,7 @@ static void make_request(struct mddev *mddev, struct bio * bio) > int first_clone; > int sectors_handled; > int max_sectors; > + sector_t start_next_window; > > /* > * Register the new request and wait if the reconstruction > @@ -1041,7 +1118,7 @@ static void make_request(struct mddev *mddev, struct bio * bio) > finish_wait(&conf->wait_barrier, &w); > } > > - wait_barrier(conf); > + start_next_window = wait_barrier(conf, bio); > > bitmap = mddev->bitmap; > > @@ -1162,6 +1239,7 @@ read_again: > > disks = conf->raid_disks * 2; > retry_write: > + r1_bio->start_next_window = start_next_window; > blocked_rdev = NULL; > rcu_read_lock(); > max_sectors = r1_bio->sectors; > @@ -1230,14 +1308,24 @@ read_again: > if (unlikely(blocked_rdev)) { > /* Wait for this device to become unblocked */ > int j; > + sector_t old = start_next_window; > > for (j = 0; j < i; j++) > if (r1_bio->bios[j]) > rdev_dec_pending(conf->mirrors[j].rdev, mddev); > r1_bio->state = 0; > - allow_barrier(conf); > + allow_barrier(conf, start_next_window, bio->bi_sector); > md_wait_for_blocked_rdev(blocked_rdev, mddev); > - wait_barrier(conf); > + start_next_window = wait_barrier(conf, bio); > + /* > + * We must make sure the multi r1bios of bio have > + * the same value of after_threetimes_resync_sector > + */ after_threetimes_resync_sector???? Attention to detail please! If you resubmit with these things fixed up I'll try to get it reviewed quite a bit quicker than this time :-) Thanks, NeilBrown > + if (bio->bi_phys_segments && old && > + old != start_next_window) > + /*wait the former r1bio(s) completed*/ > + wait_event(conf->wait_barrier, > + bio->bi_phys_segments == 1); > goto retry_write; > } > > @@ -1437,11 +1525,14 @@ static void print_conf(struct r1conf *conf) > > static void close_sync(struct r1conf *conf) > { > - wait_barrier(conf); > - allow_barrier(conf); > + wait_barrier(conf, NULL); > + allow_barrier(conf, 0, 0); > > mempool_destroy(conf->r1buf_pool); > conf->r1buf_pool = NULL; > + > + conf->next_resync = 0; > + conf->start_next_window = MaxSector; > } > > static int raid1_spare_active(struct mddev *mddev) > @@ -2712,6 +2803,9 @@ static struct r1conf *setup_conf(struct mddev *mddev) > conf->pending_count = 0; > conf->recovery_disabled = mddev->recovery_disabled - 1; > > + conf->start_next_window = MaxSector; > + conf->current_window_requests = conf->next_window_requests = 0; > + > err = -EIO; > for (i = 0; i < conf->raid_disks * 2; i++) { > > diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h > index 331a98a..07425a1 100644 > --- a/drivers/md/raid1.h > +++ b/drivers/md/raid1.h > @@ -41,6 +41,19 @@ struct r1conf { > */ > sector_t next_resync; > > + /*when raid1 start resync,we divide raid into four partitions > + * |---------|--------------|---------------------|-------------| > + * next_resync start_next_window Pc > + * Now start_next_window = next_resync + NEXT_NORMALIO_DISTANCE > + * Pc = start_next_window + NEXT_NORMALIO_DISTANCE > + * current_window_requests means the count of normalIO between > + * start_next_window and Pc. > + * next_window_requests means the count of nornalIO after Pc. > + * */ > + sector_t start_next_window; > + int current_window_requests; > + int next_window_requests; > + > spinlock_t device_lock; > > /* list of 'struct r1bio' that need to be processed by raid1d, > @@ -112,6 +125,7 @@ struct r1bio { > * in this BehindIO request > */ > sector_t sector; > + sector_t start_next_window; > int sectors; > unsigned long state; > struct mddev *mddev;
Attachment:
signature.asc
Description: PGP signature