Shaohua Li <shli@xxxxxx> writes: > On Wed, Sep 23, 2015 at 04:21:58PM +1000, Neil Brown wrote: >> Shaohua Li <shli@xxxxxx> writes: >> >> > handle_failed_stripe() makes the stripe fail, eg, all IO will return >> > with a failure, but it doesn't update stripe_head_state. Later >> > handle_stripe() has special handling for raid6 for handle_stripe_fill(). >> > That check before handle_stripe_fill() doesn't skip the failed stripe >> > and we get a kernel crash in need_this_block. This patch clear the >> > analysis state to make sure no functions wrongly called after >> > handle_failed_stripe() >> > >> > Signed-off-by: Shaohua Li <shli@xxxxxx> >> > --- >> > drivers/md/raid5.c | 4 ++++ >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> > index 394cdf8..8e4fb89a 100644 >> > --- a/drivers/md/raid5.c >> > +++ b/drivers/md/raid5.c >> > @@ -3155,6 +3155,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, >> > spin_unlock_irq(&sh->stripe_lock); >> > if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags)) >> > wake_up(&conf->wait_for_overlap); >> > + if (bi) >> > + s->to_read--; >> > while (bi && bi->bi_iter.bi_sector < >> > sh->dev[i].sector + STRIPE_SECTORS) { >> > struct bio *nextbi = >> > @@ -3173,6 +3175,8 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh, >> > */ >> > clear_bit(R5_LOCKED, &sh->dev[i].flags); >> > } >> > + s->to_write = 0; >> > + s->written = 0; >> > >> > if (test_and_clear_bit(STRIPE_FULL_WRITE, &sh->state)) >> > if (atomic_dec_and_test(&conf->pending_full_writes)) >> > -- >> > 1.8.1 >> >> Again, this probably is a sensible fix, but I would like to be certain. >> Where exactly in need_this_block does the kernel crash? I cannot see >> anything that could cause an invalid address.... > > >>>for (i = 0; i < s->failed; i++) { >>> if (fdev[i]->towrite && > the fdev[i]->towrite. because s->failed >=2 (it's 3 in my case), while > the array size is 2. > > Thanks, > Shaohua Ahh, of course. In that case I think I'd like to limit the for loop as well. So I've applied your patch and this one as well. Thanks, NeilBrown From 76e308d70b204ff0af0028458caabfeacac4541a Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxxx> Date: Thu, 24 Sep 2015 15:25:36 +1000 Subject: [PATCH] md/raid5: don't index beyond end of array in need_this_block(). When need_this_block probably shouldn't be called when there are more than 2 failed devices, we really don't want it to try indexing beyond the end of the failed_num[] of fdev[] arrays. So limit the loops to at most 2 iterations. Reported-by: Shaohua Li <shli@xxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 903d8a2b7b07..0f49ce411c9a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3304,7 +3304,7 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s, */ return 0; - for (i = 0; i < s->failed; i++) { + for (i = 0; i < s->failed && i < 2; i++) { if (fdev[i]->towrite && !test_bit(R5_UPTODATE, &fdev[i]->flags) && !test_bit(R5_OVERWRITE, &fdev[i]->flags)) @@ -3328,7 +3328,7 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s, sh->sector < sh->raid_conf->mddev->recovery_cp) /* reconstruct-write isn't being forced */ return 0; - for (i = 0; i < s->failed; i++) { + for (i = 0; i < s->failed && i < 2; i++) { if (s->failed_num[i] != sh->pd_idx && s->failed_num[i] != sh->qd_idx && !test_bit(R5_UPTODATE, &fdev[i]->flags) &&
Attachment:
signature.asc
Description: PGP signature