Re: [patch 1/3]raid5: adjust order of some operations in handle_stripe

NeilBrown <neilb@xxxxxxx> · Wed, 28 May 2014 12:59:37 +1000

On Thu, 22 May 2014 19:24:31 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:

> 
> This is to revert ef5b7c69b7a1b8b8744a6168b6f. handle_stripe_clean_event()
> handles finished stripes, which really should be the first thing to do. The
> original changelog says checking reconstruct_state should be the first as
> handle_stripe_clean_event can clear some dev->flags and impact checking
> reconstruct_state code path. It's unclear to me why this happens, because I
> thought written finish and reconstruct_state equals to *_result can't happen in
> the same time.

"unclear to me" "I thought" are sufficient to justify a change, though they
are certainly sufficient to ask a question.

Are you asking a question or submitting a change?

You may well be correct that if reconstruct_state is not
reconstruct_state_idle, then handle_stripe_clean_event cannot possible be
called.  In that case, maybe we should change the code flow to make that more
obvious, but certainly the changelog comment should be clear about exactly
why.

> 
> I also moved checking reconstruct_state code path after handle_stripe_dirtying.
> If that code sets reconstruct_state to reconstruct_state_idle, the order change
> will make us miss one handle_stripe_dirtying. But the stripe will be eventually
> handled again when written is finished.

You haven't said here why this patch is a good thing, only why it isn't
obviously bad.  I really need some justification to make a change and you
haven't provided any, at least not in this changelog comment.

Maybe we need a completely different approach.
Instead of repeatedly shuffling code inside handle_stripe(), how about we put
all of handle_stripe inside a loop which runs as long as STRIPE_HANDLE is set
and sh->count == 1.
ie.

	if (test_and_set_bit_lock(STRIPE_ACTIVE, &sh->state)) {
		/* already being handled, ensure it gets handled
		 * again when current action finishes */
		set_bit(STRIPE_HANDLE, &sh->state);
		return;
	}

        do {
	        clear_bit(STRIPE_HANDLE, &sh->state);
                __handle_stripe(sh);
        } while (test_bit(STRIPE_HANDLE, &sh->state)
                 && atomic_read(&sh->count) == 1);
	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);

where the rest of the current handle_stripe() goes in to __handle_stripe().

Would that address your performance concerns, or is there still too much
overhead?

It's not that I think your code is wrong, but I want the result to be
"obviously correct" as much as possible, and currently I don't think it is.

Thanks,
NeilBrown

> 
> Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx>
> ---
>  drivers/md/raid5.c |   74 ++++++++++++++++++++++++++---------------------------
>  1 file changed, 37 insertions(+), 37 deletions(-)
> 
> Index: linux/drivers/md/raid5.c
> ===================================================================
> --- linux.orig/drivers/md/raid5.c	2014-05-22 13:27:21.000000000 +0800
> +++ linux/drivers/md/raid5.c	2014-05-22 14:41:05.603276379 +0800
> @@ -3747,43 +3747,6 @@ static void handle_stripe(struct stripe_
>  			handle_failed_sync(conf, sh, &s);
>  	}
>  
> -	/* Now we check to see if any write operations have recently
> -	 * completed
> -	 */
> -	prexor = 0;
> -	if (sh->reconstruct_state == reconstruct_state_prexor_drain_result)
> -		prexor = 1;
> -	if (sh->reconstruct_state == reconstruct_state_drain_result ||
> -	    sh->reconstruct_state == reconstruct_state_prexor_drain_result) {
> -		sh->reconstruct_state = reconstruct_state_idle;
> -
> -		/* All the 'written' buffers and the parity block are ready to
> -		 * be written back to disk
> -		 */
> -		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags) &&
> -		       !test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags));
> -		BUG_ON(sh->qd_idx >= 0 &&
> -		       !test_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags) &&
> -		       !test_bit(R5_Discard, &sh->dev[sh->qd_idx].flags));
> -		for (i = disks; i--; ) {
> -			struct r5dev *dev = &sh->dev[i];
> -			if (test_bit(R5_LOCKED, &dev->flags) &&
> -				(i == sh->pd_idx || i == sh->qd_idx ||
> -				 dev->written)) {
> -				pr_debug("Writing block %d\n", i);
> -				set_bit(R5_Wantwrite, &dev->flags);
> -				if (prexor)
> -					continue;
> -				if (!test_bit(R5_Insync, &dev->flags) ||
> -				    ((i == sh->pd_idx || i == sh->qd_idx)  &&
> -				     s.failed == 0))
> -					set_bit(STRIPE_INSYNC, &sh->state);
> -			}
> -		}
> -		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> -			s.dec_preread_active = 1;
> -	}
> -
>  	/*
>  	 * might be able to return some write requests if the parity blocks
>  	 * are safe, or on a failed drive
> @@ -3827,6 +3790,43 @@ static void handle_stripe(struct stripe_
>  	if (s.to_write && !sh->reconstruct_state && !sh->check_state)
>  		handle_stripe_dirtying(conf, sh, &s, disks);
>  
> +	/* Now we check to see if any write operations have recently
> +	 * completed
> +	 */
> +	prexor = 0;
> +	if (sh->reconstruct_state == reconstruct_state_prexor_drain_result)
> +		prexor = 1;
> +	if (sh->reconstruct_state == reconstruct_state_drain_result ||
> +	    sh->reconstruct_state == reconstruct_state_prexor_drain_result) {
> +		sh->reconstruct_state = reconstruct_state_idle;
> +
> +		/* All the 'written' buffers and the parity block are ready to
> +		 * be written back to disk
> +		 */
> +		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags) &&
> +		       !test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags));
> +		BUG_ON(sh->qd_idx >= 0 &&
> +		       !test_bit(R5_UPTODATE, &sh->dev[sh->qd_idx].flags) &&
> +		       !test_bit(R5_Discard, &sh->dev[sh->qd_idx].flags));
> +		for (i = disks; i--; ) {
> +			struct r5dev *dev = &sh->dev[i];
> +			if (test_bit(R5_LOCKED, &dev->flags) &&
> +				(i == sh->pd_idx || i == sh->qd_idx ||
> +				 dev->written)) {
> +				pr_debug("Writing block %d\n", i);
> +				set_bit(R5_Wantwrite, &dev->flags);
> +				if (prexor)
> +					continue;
> +				if (!test_bit(R5_Insync, &dev->flags) ||
> +				    ((i == sh->pd_idx || i == sh->qd_idx)  &&
> +				     s.failed == 0))
> +					set_bit(STRIPE_INSYNC, &sh->state);
> +			}
> +		}
> +		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
> +			s.dec_preread_active = 1;
> +	}
> +
>  	/* maybe we need to check and possibly fix the parity for this stripe
>  	 * Any reads will already have been scheduled, so we just see if enough
>  	 * data is available.  The parity check is held off while parity

Attachment:
signature.asc

Description: PGP signature