On Mon, 9 Jul 2012 11:09:56 +1000 NeilBrown <neilb@xxxxxxx> wrote: > On Fri, 06 Jul 2012 14:33:47 +0200 Arnold Schulz <arnysch@xxxxxxx> wrote: > > > Hi all, > > > > about 8 seconds after inserting the leap second, a running raid1 > > resync crashed. > > Thanks for the report. > > I think you mean "8 minutes" (though it was really 7 minutes and 12 seconds). > > Also it was a 'data-check' rather than a 'resync' :-) > > It is extremely unlikely that the two are related. > > There appears to be a use-after-free bug in the data-check code which you > have manage to hit. It has been there since 2006 (2.6.16) when data-check was > added to raid1, and you are the first known victim. Well done! > > I'll submit a patch shortly. > Below is that patch I'll be submitting, once it has been in -next for a day or two. Thanks, NeilBrown From 2d4f4f3384d4ef4f7c571448e803a1ce721113d5 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Mon, 9 Jul 2012 11:34:13 +1000 Subject: [PATCH] md/raid1: fix use-after-free bug in RAID1 data-check code. This bug has been present ever since data-check was introduce in 2.6.16. However it would only fire if a data-check were done on a degraded array, which was only possible if the array has 3 or more devices. This is certainly possible, but is quite uncommon. Since hot-replace was added in 3.3 it can happen more often as the same condition can arise if not all possible replacements are present. The problem is that as soon as we submit the last read request, the 'r1_bio' structure could be freed at any time, so we really should stop looking at it. If the last device is being read from we will stop looking at it. However if the last device is not due to be read from, we will still check the bio pointer in the r1_bio, but the r1_bio might already be free. So use the read_targets counter to make sure we stop looking for bios to submit as soon as we have submitted them all. This fix is suitable for any -stable kernel since 2.6.16. Cc: stable@xxxxxxxxxxxxxxx Reported-by: Arnold Schulz <arnysch@xxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 8c2754f..240ff31 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2485,9 +2485,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp */ if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { atomic_set(&r1_bio->remaining, read_targets); - for (i = 0; i < conf->raid_disks * 2; i++) { + for (i = 0; i < conf->raid_disks * 2 && read_targets; i++) { bio = r1_bio->bios[i]; if (bio->bi_end_io == end_sync_read) { + read_targets--; md_sync_acct(bio->bi_bdev, nr_sectors); generic_make_request(bio); }
Attachment:
signature.asc
Description: PGP signature