Re: [patch 1/4] raid1: directly dispatch write request if no bitmap

NeilBrown <neilb@xxxxxxx> · Thu, 24 May 2012 12:21:12 +1000

On Wed, 23 May 2012 15:26:20 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:

> In raid1, all write requests are dispatched in raid1d thread. In fast storage,
> the raid1d thread is a bottleneck, because it dispatches request too slow. Also
> raid1d thread migrates freely, which makes request completion cpu not match
> with submission cpu even driver/block layer has such capability. This will
> cause bad cache issue.
> 
> If no bitmap, there is no point to queue bio to a thread and dispatch it in the
> thread. Directly dispatching bio doesn't impact correctness and removes above
> bottleneck.
> 
> Multiple threads dispatch requests could potentially reduce request merge and
> increase lock contention. For slow stroage, we just worry about request merge.
> Caller of .make_request should already have correct block plug set, which will
> take care of request merge and locking just like accessing raw device, so we
> don't need worry about this too much.
> 
> In a 4k randwrite test with a 2 disks setup, below patch can provide 20% ~ 50%
> performance improvements depending on numa binding.
> 
> Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx>
> ---
>  drivers/md/raid1.c |   11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> Index: linux/drivers/md/raid1.c
> ===================================================================
> --- linux.orig/drivers/md/raid1.c	2012-05-22 13:50:26.989820654 +0800
> +++ linux/drivers/md/raid1.c	2012-05-22 13:56:46.117054559 +0800
> @@ -1187,10 +1187,13 @@ read_again:
>  		mbio->bi_private = r1_bio;
>  
>  		atomic_inc(&r1_bio->remaining);
> -		spin_lock_irqsave(&conf->device_lock, flags);
> -		bio_list_add(&conf->pending_bio_list, mbio);
> -		conf->pending_count++;
> -		spin_unlock_irqrestore(&conf->device_lock, flags);
> +		if (bitmap) {
> +			spin_lock_irqsave(&conf->device_lock, flags);
> +			bio_list_add(&conf->pending_bio_list, mbio);
> +			conf->pending_count++;
> +			spin_unlock_irqrestore(&conf->device_lock, flags);
> +		} else
> +			generic_make_request(mbio);
>  	}
>  	/* Mustn't call r1_bio_write_done before this next test,
>  	 * as it could result in the bio being freed.

This looks like it should be 'obviously correct' but unfortunately it isn't.

If this raid1 is beneath a dm device (e.g. LVM), then generic_make_request
will queue the request internally and not actually start it.
In particular this means that the cloned bio has no chance of being released
before the next clone_bio call which is made on the next time around the loop.

This can lead to a deadlock as the clone_bio() might be waiting for that
first cloned bio to be released (if memory is really tight).

i.e. when allocating multiple bios from a mempool, we need to arrange for
them to be submitted by a separate thread.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature