Re: Re: [RFC PATCH V1] raid1: rewrite the iobarrier

majianpeng <majianpeng@xxxxxxxxx> · Wed, 20 Feb 2013 16:54:12 +0800

>On Thu, 24 Jan 2013 14:02:52 +0800 majianpeng <majianpeng@xxxxxxxxx> wrote:
>
>> There is iobarrier in raid1 because two reason resync/recovery  or reconfigure the array.
>> At present,it suspend all nornal IO when reysync/recovey.
>> But if nornal IO is outrange the resync/recovery windwos,it  don't need to iobarrier.
>> So I rewrite the iobarrier.
>> Because the reasons of barrier are two,so i use two different methods.
>> First for resync/recovery, there is a reysnc window.The end position is 'next_resync'.Because the resync depth is  RESYNC_DEPTH(32),
>> so the start is 'next_resync - RESYNC_SECTOR * RESYNC_DEPTH'
>> The nornal IO Will be divided into three categories by the location.
>> a: before the start of resync window
>> b: between the resync window
>> c: after the end of resync window
>> For a, it don't barrier.
>> For b, it need barrier and used the original method
>> For c, it don't barrier but it need record the minimum position.If next resync is larger this, resync action will suspend.Otherwise versa.
>> I used rbtree to order those io.
>> 
>> For the reason of reconfigure of the arrary,I proposed a concept "force_barrier".When there is force_barrier, all Nornam IO must be suspended.
>> 
>> NOTE:
>> Because this problem is also for raid10, but i only do it for raid1. It is post out mainly to make sure it is 
>> going in the correct direction and hope to get some helpful comments from other guys.
>> If the methods is accepted,i will send the patch for raid10.
>> 
>> Signed-off-by: Jianpeng Ma <majianpeng@xxxxxxxxx>
>
>Hi,
> thanks for this and sorry for the delay in replying.
>
Hi, sorroy for delay in replying.Thanks very much for your suggestion.
>The patch is reasonably good, but there is room for improvement.
>I would break it up into several patches which are easier to review.
>
>- Firstly, don't worry about the barrier for 'read' requests - it really
>  isn't relevant for them (your patch didn't do this).
>

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index bd6a188..2e5bf75 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -235,6 +235,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
        struct bio *bio = r1_bio->master_bio;
        int done;
        struct r1conf *conf = r1_bio->mddev->private;
+       int rw = bio_data_dir(bio);
 
        if (bio->bi_phys_segments) {
                unsigned long flags;
@@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
                 * Wake up any possible resync thread that waits for the device
                 * to go idle.
                 */
-               allow_barrier(conf);
+               if (rw == WRITE)
+                       allow_barrier(conf);
        }
 }
 
@@ -1035,7 +1037,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
                finish_wait(&conf->wait_barrier, &w);
        }
 
-       wait_barrier(conf);
+       if (rw == WRITE)
+               wait_barrier(conf);
 
        bitmap = mddev->bitmap;

Above code is what's you said.But it met read-error,raid1d will blocked for ever.
The reason is freeze_array:
>       wait_event_lock_irq_cmd(conf->wait_barrier,
>                                conf->nr_pending == conf->nr_queued+1,
For read operation, it can't add nr_pending.
Are you have good method to do this?

>- Secondly, find those places where raise_barrier() is used to reconfigure
>  the array and replace those with "freeze_array()".  I think that is safe,
>  and it means that we don't need to pass a 'force' argument to
>  'raise_barrier()' and 'lower_barrier()'.
But it will be blocked for ever.
The comment of freeze_array,
> * This is called in the context of one normal IO request
>	 * that has failed.
In freeze_array,the judgement is
>wait_event_lock_irq_cmd(conf->wait_barrier,
>				conf->nr_pending == conf->nr_queued+1,
Because the place where call this func in io context,so there must be nr_pending.
But for the flace where called reconfigure array don't  in io context.So the condition
"conf->nr_pending == conf->nr_queued+1" is never true.
If we add a parameter 'int iocontext' to freeze_array, that is
>static void freeze_array(struct r1conf *conf, int iocontext)
>if (iocontext)
>wait_event_lock_irq_cmd(conf->wait_barrier,
>				conf->nr_pending == conf->nr_queued+1,
>  else
> wait_event_lock_irq_cmd(conf->wait_barrier,
>				conf->nr_pending == conf->nr_queued,

How about this method?
>
>- The rest might have to be all one patch, though if it could be done in a
>  couple of distinct stages that would be good.
>  For the different positions (before, during, after), which you currently
>  call 0, 1, and 2 you should use an enum so they all have names.
> 
Ok,thanks!
>  I don't really like the rbtree.  It adds an extra place where you take the
>  spinlock and causes more work to be done inside a spinlock.  And it isn't
>  really needed.  Instead, you can have two R1BIO flags for "AFTER_RESYNC".
>  One for requests that are more than 2 windows after, one for request that
>  are less then 2 windows after (or maybe 3 windows or maybe 8..).  Each of
>  these has a corresponding count (as you already have: nr_after).
>  Then when resync gets to the end of a window you wait until the count of 
For resync operation,there is no resync window.How to do this?
>  "less than 2 windows after" reaches zero, then atomically swap the meaning
>  of the two bits (toggle a bit somewhere).
We don't know the nearset position from request which more than 2 windows after.
For example, there are three request after 2 windows.
A is after three windows;B is after six windows; C is after 11 windows.
When the count of "less than 2 windows after" reached zero, how to do?
>  This should give you nearly the same effect with constant (not log-n)
>  effort.
>

>Finally, have you done any testing, either to ensure there is no slow-down,
>or to demonstrate some improvement?
>
I only used dd to test.There was no apparent performance degradation 

Thanks!
Jianpeng Ma?韬{.n?????%??檩??w?{.n???{炳盯w???塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f