>On Thu, 24 Jan 2013 14:02:52 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote: > >> There is iobarrier in raid1 because two reason resync/recovery or reconfigure the array. >> At present,it suspend all nornal IO when reysync/recovey. >> But if nornal IO is outrange the resync/recovery windwos,it don't need to iobarrier. >> So I rewrite the iobarrier. >> Because the reasons of barrier are two,so i use two different methods. >> First for resync/recovery, there is a reysnc window.The end position is 'next_resync'.Because the resync depth is RESYNC_DEPTH(32), >> so the start is 'next_resync - RESYNC_SECTOR * RESYNC_DEPTH' >> The nornal IO Will be divided into three categories by the location. >> a: before the start of resync window >> b: between the resync window >> c: after the end of resync window >> For a, it don't barrier. >> For b, it need barrier and used the original method >> For c, it don't barrier but it need record the minimum position.If next resync is larger this, resync action will suspend.Otherwise versa. >> I used rbtree to order those io. >> >> For the reason of reconfigure of the arrary,I proposed a concept "force_barrier".When there is force_barrier, all Nornam IO must be suspended. >> >> NOTE: >> Because this problem is also for raid10, but i only do it for raid1. It is post out mainly to make sure it is >> going in the correct direction and hope to get some helpful comments from other guys. >> If the methods is accepted,i will send the patch for raid10. >> >> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx> > >Hi, > thanks for this and sorry for the delay in replying. > Hi, sorroy for delay in replying.Thanks very much for your suggestion. >The patch is reasonably good, but there is room for improvement. >I would break it up into several patches which are easier to review. > >- Firstly, don't worry about the barrier for 'read' requests - it really > isn't relevant for them (your patch didn't do this). > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index bd6a188..2e5bf75 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -235,6 +235,7 @@ static void call_bio_endio(struct r1bio *r1_bio) struct bio *bio = r1_bio->master_bio; int done; struct r1conf *conf = r1_bio->mddev->private; + int rw = bio_data_dir(bio); if (bio->bi_phys_segments) { unsigned long flags; @@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio) * Wake up any possible resync thread that waits for the device * to go idle. */ - allow_barrier(conf); + if (rw == WRITE) + allow_barrier(conf); } } @@ -1035,7 +1037,8 @@ static void make_request(struct mddev *mddev, struct bio * bio) finish_wait(&conf->wait_barrier, &w); } - wait_barrier(conf); + if (rw == WRITE) + wait_barrier(conf); bitmap = mddev->bitmap; Above code is what's you said.But it met read-error,raid1d will blocked for ever. The reason is freeze_array: > wait_event_lock_irq_cmd(conf->wait_barrier, > conf->nr_pending == conf->nr_queued+1, For read operation, it can't add nr_pending. Are you have good method to do this? >- Secondly, find those places where raise_barrier() is used to reconfigure > the array and replace those with "freeze_array()". I think that is safe, > and it means that we don't need to pass a 'force' argument to > 'raise_barrier()' and 'lower_barrier()'. But it will be blocked for ever. The comment of freeze_array, > * This is called in the context of one normal IO request > * that has failed. In freeze_array,the judgement is >wait_event_lock_irq_cmd(conf->wait_barrier, > conf->nr_pending == conf->nr_queued+1, Because the place where call this func in io context,so there must be nr_pending. But for the flace where called reconfigure array don't in io context.So the condition "conf->nr_pending == conf->nr_queued+1" is never true. If we add a parameter 'int iocontext' to freeze_array, that is >static void freeze_array(struct r1conf *conf, int iocontext) >if (iocontext) >wait_event_lock_irq_cmd(conf->wait_barrier, > conf->nr_pending == conf->nr_queued+1, > else > wait_event_lock_irq_cmd(conf->wait_barrier, > conf->nr_pending == conf->nr_queued, How about this method? > >- The rest might have to be all one patch, though if it could be done in a > couple of distinct stages that would be good. > For the different positions (before, during, after), which you currently > call 0, 1, and 2 you should use an enum so they all have names. > Ok,thanks! > I don't really like the rbtree. It adds an extra place where you take the > spinlock and causes more work to be done inside a spinlock. And it isn't > really needed. Instead, you can have two R1BIO flags for "AFTER_RESYNC". > One for requests that are more than 2 windows after, one for request that > are less then 2 windows after (or maybe 3 windows or maybe 8..). Each of > these has a corresponding count (as you already have: nr_after). > Then when resync gets to the end of a window you wait until the count of For resync operation,there is no resync window.How to do this? > "less than 2 windows after" reaches zero, then atomically swap the meaning > of the two bits (toggle a bit somewhere). We don't know the nearset position from request which more than 2 windows after. For example, there are three request after 2 windows. A is after three windows;B is after six windows; C is after 11 windows. When the count of "less than 2 windows after" reached zero, how to do? > This should give you nearly the same effect with constant (not log-n) > effort. > >Finally, have you done any testing, either to ensure there is no slow-down, >or to demonstrate some improvement? > I only used dd to test.There was no apparent performance degradation Thanks! Jianpeng Ma?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f