Re: [patch]raid5: make_request does less prepare wait

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 9 Apr 2014 11:25:47 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:

> On Wed, Apr 09, 2014 at 12:08:08PM +1000, NeilBrown wrote:
> > On Tue, 8 Apr 2014 12:05:07 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:
> > 
> > > 
> > > In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> > > contention for sequential workload (or big request size workload). For such
> > > workload, each bio includes several stripes. So we can just do
> > > prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> > > This reduces the lock contention completely for such workload. Random workload
> > > might have the similar lock contention too, but I didn't see it yet, maybe
> > > because my stroage is still not fast enough.
> > > 
> > > Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx>
> > 
> > Thanks,
> > this looks every sensible, except .....
> > 
> > 
> > > ---
> > >  drivers/md/raid5.c |   18 ++++++++++++++----
> > >  1 file changed, 14 insertions(+), 4 deletions(-)
> > > 
> > > Index: linux/drivers/md/raid5.c
> > > ===================================================================
> > > --- linux.orig/drivers/md/raid5.c	2014-04-08 09:04:20.000000000 +0800
> > > +++ linux/drivers/md/raid5.c	2014-04-08 09:11:08.201533487 +0800
> > > @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
> > >  	struct stripe_head *sh;
> > >  	const int rw = bio_data_dir(bi);
> > >  	int remaining;
> > > +	DEFINE_WAIT(w);
> > > +	bool do_prepare;
> > >  
> > >  	if (unlikely(bi->bi_rw & REQ_FLUSH)) {
> > >  		md_flush_request(mddev, bi);
> > > @@ -4575,15 +4577,19 @@ static void make_request(struct mddev *m
> > >  	bi->bi_next = NULL;
> > >  	bi->bi_phys_segments = 1;	/* over-loaded to count active stripes */
> > >  
> > > +	prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> > >  	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
> > >  		DEFINE_WAIT(w);
> >                 ^^^^^^^^^^^^^^^
> > 
> > Shouldn't this be removed?  If so, please resubmit with that line deleted and
> > I'll apply the patch.
> 
> Ah, that's silly, looks I sent wrong patch, sorry! Below is the correct one and I double
> checked it's the one working for me.
> 
> 
> Subject: raid5: make_request does less prepare wait
> 
> In NUMA machine, prepare_to_wait/finish_wait in make_request exposes a lot of
> contention for sequential workload (or big request size workload). For such
> workload, each bio includes several stripes. So we can just do
> prepare_to_wait/finish_wait once for the whold bio instead of every stripe.
> This reduces the lock contention completely for such workload. Random workload
> might have the similar lock contention too, but I didn't see it yet, maybe
> because my stroage is still not fast enough.
> 
> Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx>
> ---
>  drivers/md/raid5.c |   19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> Index: linux/drivers/md/raid5.c
> ===================================================================
> --- linux.orig/drivers/md/raid5.c	2014-04-08 12:02:54.485630590 +0800
> +++ linux/drivers/md/raid5.c	2014-04-09 11:03:04.276210597 +0800
> @@ -4552,6 +4552,8 @@ static void make_request(struct mddev *m
>  	struct stripe_head *sh;
>  	const int rw = bio_data_dir(bi);
>  	int remaining;
> +	DEFINE_WAIT(w);
> +	bool do_prepare;
>  
>  	if (unlikely(bi->bi_rw & REQ_FLUSH)) {
>  		md_flush_request(mddev, bi);
> @@ -4575,15 +4577,18 @@ static void make_request(struct mddev *m
>  	bi->bi_next = NULL;
>  	bi->bi_phys_segments = 1;	/* over-loaded to count active stripes */
>  
> +	prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
>  	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
> -		DEFINE_WAIT(w);
>  		int previous;
>  		int seq;
>  
> +		do_prepare = false;
>  	retry:
>  		seq = read_seqcount_begin(&conf->gen_lock);
>  		previous = 0;
> -		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
> +		if (do_prepare)
> +			prepare_to_wait(&conf->wait_for_overlap, &w,
> +				TASK_UNINTERRUPTIBLE);
>  		if (unlikely(conf->reshape_progress != MaxSector)) {
>  			/* spinlock is needed as reshape_progress may be
>  			 * 64bit on a 32bit platform, and so it might be
> @@ -4604,6 +4609,7 @@ static void make_request(struct mddev *m
>  				    : logical_sector >= conf->reshape_safe) {
>  					spin_unlock_irq(&conf->device_lock);
>  					schedule();
> +					do_prepare = true;
>  					goto retry;
>  				}
>  			}
> @@ -4640,6 +4646,7 @@ static void make_request(struct mddev *m
>  				if (must_retry) {
>  					release_stripe(sh);
>  					schedule();
> +					do_prepare = true;
>  					goto retry;
>  				}
>  			}
> @@ -4663,8 +4670,10 @@ static void make_request(struct mddev *m
>  				prepare_to_wait(&conf->wait_for_overlap,
>  						&w, TASK_INTERRUPTIBLE);
>  				if (logical_sector >= mddev->suspend_lo &&
> -				    logical_sector < mddev->suspend_hi)
> +				    logical_sector < mddev->suspend_hi) {
>  					schedule();
> +					do_prepare = true;
> +				}
>  				goto retry;
>  			}
>  
> @@ -4677,9 +4686,9 @@ static void make_request(struct mddev *m
>  				md_wakeup_thread(mddev->thread);
>  				release_stripe(sh);
>  				schedule();
> +				do_prepare = true;
>  				goto retry;
>  			}
> -			finish_wait(&conf->wait_for_overlap, &w);
>  			set_bit(STRIPE_HANDLE, &sh->state);
>  			clear_bit(STRIPE_DELAYED, &sh->state);
>  			if ((bi->bi_rw & REQ_SYNC) &&
> @@ -4689,10 +4698,10 @@ static void make_request(struct mddev *m
>  		} else {
>  			/* cannot get stripe for read-ahead, just give-up */
>  			clear_bit(BIO_UPTODATE, &bi->bi_flags);
> -			finish_wait(&conf->wait_for_overlap, &w);
>  			break;
>  		}
>  	}
> +	finish_wait(&conf->wait_for_overlap, &w);
>  
>  	remaining = raid5_dec_bi_active_stripes(bi);
>  	if (remaining == 0) {


Applied, thanks.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux