On Mon, 8 Oct 2012 15:57:14 +0200 Jes.Sorensen@xxxxxxxxxx wrote: > From: Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> > > Avoid calling rdev_set_badblocks() if the underlying device has been > removed by raid1d() prior to calling fix_read_error(). > > This should be applicable to the 3.x stable kernel series as well. > > Original bug analysis from Peng Yu: > > Assume this condition: > > We have a software raid1 device /dev/md0, it has two underlying > devices, they are /dev/sda and /dev/sdb. Assign an IO to sda, > rdev->pending add 1, an IO error occur on sda, rdev->pending dec 1, > then wakeup the raid1d thread. At the same time, a user run such a > command: > > Kernel will set Faulty bit to rdev->flags, then wakeup raid1d > thread. The raid1d thread will call md_check_recovery function, this > function will call raid1_remove_disk function if it see the Faulty > flag is set. In raid1_remove_disk function, the rdev->nr_pending is > zero, so it will set rdev to NULL. Then raid1d thread return from > md_check_recovery function, continue call handle_read_error to hand > the read error, handle_read_error function will call fix_read_error > function, in fix_read_error function, see these lines: > > if (!success) { > /* Cannot read from anywhere - mark it bad */ > struct md_rdev *rdev = conf->mirrors[read_disk].rdev; > if (!rdev_set_badblocks(rdev, sect, s, 0)) > md_error(mddev, rdev); > break; > } > > It doesn't check whether the rdev is NULL, just pass it to > rdev_set_badblocks function, and it maybe a NULL pointer, so the > kernel crash. > > Reported-by: Peng Yu <pyu@xxxxxxxxxx> > Signed-off-by: Jes Sorensen <Jes.Sorensen@xxxxxxxxxx> > --- > drivers/md/raid1.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > index 611b5f7..ffdad74 100644 > --- a/drivers/md/raid1.c > +++ b/drivers/md/raid1.c > @@ -2005,7 +2005,7 @@ static void fix_read_error(struct r1conf *conf, int read_disk, > if (!success) { > /* Cannot read from anywhere - mark it bad */ > struct md_rdev *rdev = conf->mirrors[read_disk].rdev; > - if (!rdev_set_badblocks(rdev, sect, s, 0)) > + if (rdev && !rdev_set_badblocks(rdev, sect, s, 0)) > md_error(mddev, rdev); > break; > } Hi Jes, this was already reported on linux-raid and I said I didn't like that fix because I think it is wrong that the rdev could be removed while we are trying to fix a read error on it. So I've applied this following which is in -next. NeilBrown From c809064145d88b873c6acdced5a412aabbe8b8e5 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Thu, 27 Sep 2012 12:37:27 +1000 Subject: [PATCH] md/raid1: Don't release reference to device while handling read error. When we get a read error, we arrange for raid1d to handle it. Currently we release the reference on the device. This can result in conf->mirrors[read_disk].rdev being NULL in fix_read_error, if the device happens to get removed before the read error is handled. So instead keep the reference until the read error has been fully handled. Reported-by: hank <pyu@xxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 3092afe..d705f9e 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -333,9 +333,10 @@ static void raid1_end_read_request(struct bio *bio, int error) spin_unlock_irqrestore(&conf->device_lock, flags); } - if (uptodate) + if (uptodate) { raid_end_bio_io(r1_bio); - else { + rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev); + } else { /* * oops, read error: */ @@ -349,9 +350,8 @@ static void raid1_end_read_request(struct bio *bio, int error) (unsigned long long)r1_bio->sector); set_bit(R1BIO_ReadError, &r1_bio->state); reschedule_retry(r1_bio); + /* don't drop the reference on read_disk yet */ } - - rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev); } static void close_write(struct r1bio *r1_bio) @@ -2228,6 +2228,7 @@ static void handle_read_error(struct r1conf *conf, struct r1bio *r1_bio) unfreeze_array(conf); } else md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev); + rdev_dec_pending(conf->mirrors[r1_bio->read_disk].rdev, conf->mddev); bio = r1_bio->bios[r1_bio->read_disk]; bdevname(bio->bi_bdev, b);
Attachment:
signature.asc
Description: PGP signature