On Thu, 1 Nov 2012 15:58:04 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote: > Hi Neil: > At present, there are barriers in raid1/10.About barrier,you described: "Sometimes we need to suspend IO while we do something else, either some resync/recovery, or reconfigure the array" > So when do normal request in func make_request, it call wait_barrier.And in sync_request it call raise_barrier. > Of course,there are some place to call raise_barrier to do other things. > But there, i only wanted to talk about normal io and resync/recovery. > > 1:I think you add io barrier is because raid1/10 didn't stripe like raid456.So it only for the situation which normal io and sync io had same content. Is that right? Same sector, yes. > 2: If normal ios outside the resync/recovery window, i think there is no necessary to use io barrier. correct. > I wanted to find the reason by reviewing the git log.But at present, git log of kernel is only after kernel 2.6.12. > So i looked into the code about kernel 2.4.And found something like i thought. There were resync window. > But why did you remove or modify ? I didn't. Ingo Molnar did. http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=0925bad356699684440720f2a908030f98bc3284 It wouldn't have generalised to raid10 very well (because resync and recovery are handled very differently in raid10). > 3: I think using resync window to control normal io is perfect but complicated.So i only used one positon which before mddev->curr_resync_completed. Looks good, though I'll have a proper look next week when I have a bit more time. You only need the barrier for write operations, but I think we raise it for read operations too. We can probably skip that. Thanks, NeilBrown > The code is: > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 8034fbd..c45a769 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -234,6 +234,7 @@ static void call_bio_endio(struct r1bio *r1_bio) > { > struct bio *bio = r1_bio->master_bio; > int done; > + int skip_barrier = test_bit(R1BIO_SKIP_BARRIER, &r1_bio->state); > struct r1conf *conf = r1_bio->mddev->private; > > if (bio->bi_phys_segments) { > @@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio) > * Wake up any possible resync thread that waits for the device > * to go idle. > */ > - allow_barrier(conf); > + if (!skip_barrier) > + allow_barrier(conf); > } > } > > @@ -1007,6 +1009,7 @@ static void make_request(struct mddev *mddev, struct bio * bio) > int first_clone; > int sectors_handled; > int max_sectors; > + int skip_barrier = 0; > > /* > * Register the new request and wait if the reconstruction > @@ -1036,7 +1039,13 @@ static void make_request(struct mddev *mddev, struct bio * bio) > finish_wait(&conf->wait_barrier, &w); > } > > - wait_barrier(conf); > + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && > + (bio->bi_sector + bio->bi_size/512 <= > + mddev->curr_resync_completed)) > + skip_barrier = 1; > + > + if (!skip_barrier) > + wait_barrier(conf); > > bitmap = mddev->bitmap; > > @@ -1053,6 +1062,8 @@ static void make_request(struct mddev *mddev, struct bio * bio) > r1_bio->mddev = mddev; > r1_bio->sector = bio->bi_sector; > > + if (skip_barrier) > + set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state); > /* We might need to issue multiple reads to different > * devices if there are bad blocks around, so we keep > * track of the number of reads in bio->bi_phys_segments. > @@ -1229,10 +1240,15 @@ read_again: > for (j = 0; j < i; j++) > if (r1_bio->bios[j]) > rdev_dec_pending(conf->mirrors[j].rdev, mddev); > - r1_bio->state = 0; > - allow_barrier(conf); > + if (!skip_barrier) > + r1_bio->state = 0; > + else > + set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state); > + if (!skip_barrier) > + allow_barrier(conf); > md_wait_for_blocked_rdev(blocked_rdev, mddev); > - wait_barrier(conf); > + if (!skip_barrier) > + wait_barrier(conf); > goto retry_write; > } > > diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h > index 0ff3715..fab9fda 100644 > --- a/drivers/md/raid1.h > +++ b/drivers/md/raid1.h > @@ -158,6 +158,8 @@ struct r1bio { > #define R1BIO_MadeGood 7 > #define R1BIO_WriteError 8 > > +#define R1BIO_SKIP_BARRIER 9 > + > extern int md_raid1_congested(struct mddev *mddev, int bits); > > #endif > > At present, the code only for normal io and sync io.But etc stop/remove disk /add disk/fix_read_error also used io barrier. > I used fio to test. > The fio script is: > [global] > filename=/dev/md0 > direct=1 > bs=512K > > [write] > rw=write > runtime=600 > > [read] > stonewall > rw=read > runtime=600 > > I firstly started resync/recovery for some times in order to normal io position is before curr_resync_completed. > And i also keep the resync/recovery speed is about 30MB/s throught sys_speed_min. > The result is: > w/ above code: > write recovery/resync 30MB/S; fio 70MB/s > read recovery/resync 30MB/s; fio 90MB/s > > w/o above code: > write recovery/resync 30MB/s; fio 29MB/s > read recovery/resync 30MB/s; fio 31MB/s > > The result is remarkable. > > > Jianpeng
Attachment:
signature.asc
Description: PGP signature