Re: [PATCH 2/2] raid5: update analysis state for failed stripe

Neil Brown <neilb@xxxxxxx> · Wed, 23 Sep 2015 16:21:58 +1000

Shaohua Li <shli@xxxxxx> writes:

> handle_failed_stripe() makes the stripe fail, eg, all IO will return
> with a failure, but it doesn't update stripe_head_state. Later
> handle_stripe() has special handling for raid6 for handle_stripe_fill().
> That check before handle_stripe_fill() doesn't skip the failed stripe
> and we get a kernel crash in need_this_block.  This patch clear the
> analysis state to make sure no functions wrongly called after
> handle_failed_stripe()
>
> Signed-off-by: Shaohua Li <shli@xxxxxx>
> ---
>  drivers/md/raid5.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 394cdf8..8e4fb89a 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3155,6 +3155,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
>  			spin_unlock_irq(&sh->stripe_lock);
>  			if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
>  				wake_up(&conf->wait_for_overlap);
> +			if (bi)
> +				s->to_read--;
>  			while (bi && bi->bi_iter.bi_sector <
>  			       sh->dev[i].sector + STRIPE_SECTORS) {
>  				struct bio *nextbi =
> @@ -3173,6 +3175,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
>  		 */
>  		clear_bit(R5_LOCKED, &sh->dev[i].flags);
>  	}
> +	s->to_write = 0;
> +	s->written = 0;
>  
>  	if (test_and_clear_bit(STRIPE_FULL_WRITE, &sh->state))
>  		if (atomic_dec_and_test(&conf->pending_full_writes))
> -- 
> 1.8.1

Again, this probably is a sensible fix, but I would like to be certain.
Where exactly in need_this_block does the kernel crash?  I cannot see
anything that could cause an invalid address....

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature