On Mon, 8 Jul 2013 16:06:56 -0400 Joe Lawrence <joe.lawrence@xxxxxxxxxxx> wrote: > [+cc Kent's new mail?] > > On Thu, 4 Jul 2013 10:53:37 +1000 > NeilBrown <neilb@xxxxxxx> wrote: > > > > I would propose "bio_rewind" that exactly undoes any "bio_advance". > > > > [ ... snip ...] > > > > Then call that on pbio at the same place we call bio_reset on sbio. > > > > You could probably also call bio_rewind on sbio, and remove lots of that > > code for setting the bio up again. > > This appears to work for my test case (no crashes or post repair > mismatch_cnt): > > mdadm --fail /dev/$MD /dev/sda3 > mdadm --remove /dev/$MD /dev/sda3 > dd if=/dev/urandom of=/dev/$MD bs=1M count=500 > mdadm --stop /dev/$MD > > mdadm --create /dev/$MD --level=1 --assume-clean --raid-devices=2 \ > --bitmap=internal /dev/sda3 /dev/sdi3 > echo repair > /sys/block/$MD/md/sync_action > > I'll reply to this email with the patches that implement your > suggested changes. Feel free to combine or redo them, posting them here > was the easiest way to provide my signed-off should you need it. > > Although it works, having to rewind the bio io_vec index (and > clearing the bi_next pointers) before calling bio_copy_data feels a > bit clunky. What bio_copy_data is really doing is > "bio_copy_remaining_data in a bio chain," whereas MD wants > "bio_copy_completed_data from a single bio". > > I took a look at Kent's tree and a lot of the block layer handling was > simplified through the bvec_iter. I don't know if that code is > destined for 3.11. If so, it would probably be easier to retest and > base any MD changes off of his. And for 3.10 stable, the minimal > fix would be for MD to just manipulate the bi_idx itself (or revert > calling bio_copy_data altogether.) > Hi, just letting you know I've decided to fix this differently for 3.11 and 3.10.y. I noticed there was another problem in that code, in that the len argument passed to memcmp was bv_len which might have been modified. So I moved the "fix up the bio" code to the top of the function and applied it to all relevant bios. I'd still like to go with bio_rewind or similar, but it's probably best for that to wait for the other bio stuff to settle. Maybe it won't be needed. NeilBrown commit 9a3856f56b467425e13a19d95524524e76eab040 Author: NeilBrown <neilb@xxxxxxx> Date: Wed Jul 17 15:19:29 2013 +1000 md/raid1: fix bio handling problems in process_checks() Recent change to use bio_copy_data() in raid1 when repairing an array is faulty. The underlying may have changed the bio in various ways using bio_advance and these need to be undone not just for the 'sbio' which is being copied to, but also the 'pbio' (primary) which is being copied from. So perform the reset on all bios that were read from and do it early. This also ensure that the sbio->bi_io_vec[j].bv_len passed to memcmp is correct. Reported-by: Joe Lawrence <joe.lawrence@xxxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index ec73458..d60412c 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1849,6 +1849,36 @@ static int process_checks(struct r1bio *r1_bio) int i; int vcnt; + /* Fix variable parts of all bios */ + vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9); + for (i = 0; i < conf->raid_disks * 2; i++) { + int j; + int size; + struct bio *b = r1_bio->bios[i]; + if (b->bi_end_io != end_sync_read) + continue; + /* fixup the bio for reuse */ + bio_reset(b); + b->bi_vcnt = vcnt; + b->bi_size = r1_bio->sectors << 9; + b->bi_sector = r1_bio->sector + + conf->mirrors[i].rdev->data_offset; + b->bi_bdev = conf->mirrors[i].rdev->bdev; + b->bi_end_io = end_sync_read; + b->bi_private = r1_bio; + + size = b->bi_size; + for (j = 0; j < vcnt ; j++) { + struct bio_vec *bi; + bi = &b->bi_io_vec[j]; + bi->bv_offset = 0; + if (size > PAGE_SIZE) + bi->bv_len = PAGE_SIZE; + else + bi->bv_len = size; + size -= PAGE_SIZE; + } + } for (primary = 0; primary < conf->raid_disks * 2; primary++) if (r1_bio->bios[primary]->bi_end_io == end_sync_read && test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) { @@ -1857,12 +1887,10 @@ static int process_checks(struct r1bio *r1_bio) break; } r1_bio->read_disk = primary; - vcnt = (r1_bio->sectors + PAGE_SIZE / 512 - 1) >> (PAGE_SHIFT - 9); for (i = 0; i < conf->raid_disks * 2; i++) { int j; struct bio *pbio = r1_bio->bios[primary]; struct bio *sbio = r1_bio->bios[i]; - int size; if (sbio->bi_end_io != end_sync_read) continue; @@ -1888,27 +1916,6 @@ static int process_checks(struct r1bio *r1_bio) rdev_dec_pending(conf->mirrors[i].rdev, mddev); continue; } - /* fixup the bio for reuse */ - bio_reset(sbio); - sbio->bi_vcnt = vcnt; - sbio->bi_size = r1_bio->sectors << 9; - sbio->bi_sector = r1_bio->sector + - conf->mirrors[i].rdev->data_offset; - sbio->bi_bdev = conf->mirrors[i].rdev->bdev; - sbio->bi_end_io = end_sync_read; - sbio->bi_private = r1_bio; - - size = sbio->bi_size; - for (j = 0; j < vcnt ; j++) { - struct bio_vec *bi; - bi = &sbio->bi_io_vec[j]; - bi->bv_offset = 0; - if (size > PAGE_SIZE) - bi->bv_len = PAGE_SIZE; - else - bi->bv_len = size; - size -= PAGE_SIZE; - } bio_copy_data(sbio, pbio); }
Attachment:
signature.asc
Description: PGP signature