Hi Neil: At present, there are barriers in raid1/10.About barrier,you described: "Sometimes we need to suspend IO while we do something else, either some resync/recovery, or reconfigure the array" So when do normal request in func make_request, it call wait_barrier.And in sync_request it call raise_barrier. Of course,there are some place to call raise_barrier to do other things. But there, i only wanted to talk about normal io and resync/recovery. 1:I think you add io barrier is because raid1/10 didn't stripe like raid456.So it only for the situation which normal io and sync io had same content. Is that right? 2: If normal ios outside the resync/recovery window, i think there is no necessary to use io barrier. I wanted to find the reason by reviewing the git log.But at present, git log of kernel is only after kernel 2.6.12. So i looked into the code about kernel 2.4.And found something like i thought. There were resync window. But why did you remove or modify ? 3: I think using resync window to control normal io is perfect but complicated.So i only used one positon which before mddev->curr_resync_completed. The code is: diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 8034fbd..c45a769 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -234,6 +234,7 @@ static void call_bio_endio(struct r1bio *r1_bio) { struct bio *bio = r1_bio->master_bio; int done; + int skip_barrier = test_bit(R1BIO_SKIP_BARRIER, &r1_bio->state); struct r1conf *conf = r1_bio->mddev->private; if (bio->bi_phys_segments) { @@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio) * Wake up any possible resync thread that waits for the device * to go idle. */ - allow_barrier(conf); + if (!skip_barrier) + allow_barrier(conf); } } @@ -1007,6 +1009,7 @@ static void make_request(struct mddev *mddev, struct bio * bio) int first_clone; int sectors_handled; int max_sectors; + int skip_barrier = 0; /* * Register the new request and wait if the reconstruction @@ -1036,7 +1039,13 @@ static void make_request(struct mddev *mddev, struct bio * bio) finish_wait(&conf->wait_barrier, &w); } - wait_barrier(conf); + if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && + (bio->bi_sector + bio->bi_size/512 <= + mddev->curr_resync_completed)) + skip_barrier = 1; + + if (!skip_barrier) + wait_barrier(conf); bitmap = mddev->bitmap; @@ -1053,6 +1062,8 @@ static void make_request(struct mddev *mddev, struct bio * bio) r1_bio->mddev = mddev; r1_bio->sector = bio->bi_sector; + if (skip_barrier) + set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state); /* We might need to issue multiple reads to different * devices if there are bad blocks around, so we keep * track of the number of reads in bio->bi_phys_segments. @@ -1229,10 +1240,15 @@ read_again: for (j = 0; j < i; j++) if (r1_bio->bios[j]) rdev_dec_pending(conf->mirrors[j].rdev, mddev); - r1_bio->state = 0; - allow_barrier(conf); + if (!skip_barrier) + r1_bio->state = 0; + else + set_bit(R1BIO_SKIP_BARRIER, &r1_bio->state); + if (!skip_barrier) + allow_barrier(conf); md_wait_for_blocked_rdev(blocked_rdev, mddev); - wait_barrier(conf); + if (!skip_barrier) + wait_barrier(conf); goto retry_write; } diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h index 0ff3715..fab9fda 100644 --- a/drivers/md/raid1.h +++ b/drivers/md/raid1.h @@ -158,6 +158,8 @@ struct r1bio { #define R1BIO_MadeGood 7 #define R1BIO_WriteError 8 +#define R1BIO_SKIP_BARRIER 9 + extern int md_raid1_congested(struct mddev *mddev, int bits); #endif At present, the code only for normal io and sync io.But etc stop/remove disk /add disk/fix_read_error also used io barrier. I used fio to test. The fio script is: [global] filename=/dev/md0 direct=1 bs=512K [write] rw=write runtime=600 [read] stonewall rw=read runtime=600 I firstly started resync/recovery for some times in order to normal io position is before curr_resync_completed. And i also keep the resync/recovery speed is about 30MB/s throught sys_speed_min. The result is: w/ above code: write recovery/resync 30MB/S; fio 70MB/s read recovery/resync 30MB/s; fio 90MB/s w/o above code: write recovery/resync 30MB/s; fio 29MB/s read recovery/resync 30MB/s; fio 31MB/s The result is remarkable. Jianpeng?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f