On Sat, Jan 07 2017, Mike Snitzer wrote: > On Fri, Jan 06 2017 at 12:34pm -0500, > Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > >> >> >> On Fri, 6 Jan 2017, Mikulas Patocka wrote: >> >> > >> > >> > On Wed, 4 Jan 2017, Mike Snitzer wrote: >> > >> > > On Wed, Jan 04 2017 at 12:12am -0500, >> > > NeilBrown <neilb@xxxxxxxx> wrote: >> > > >> > > > > Suggested-by: NeilBrown <neilb@xxxxxxxx> >> > > > > Signed-off-by: Jack Wang <jinpu.wang@xxxxxxxxxxxxxxxx> >> > > > > --- >> > > > > block/blk-core.c | 20 ++++++++++++++++++++ >> > > > > 1 file changed, 20 insertions(+) >> > > > > >> > > > > diff --git a/block/blk-core.c b/block/blk-core.c >> > > > > index 9e3ac56..47ef373 100644 >> > > > > --- a/block/blk-core.c >> > > > > +++ b/block/blk-core.c >> > > > > @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio) >> > > > > struct request_queue *q = bdev_get_queue(bio->bi_bdev); >> > > > > >> > > > > if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) { >> > > > > + struct bio_list lower, same, hold; >> > > > > + >> > > > > + /* Create a fresh bio_list for all subordinate requests */ >> > > > > + bio_list_init(&hold); >> > > > > + bio_list_merge(&hold, &bio_list_on_stack); >> > > > > + bio_list_init(&bio_list_on_stack); >> > > > > >> > > > > ret = q->make_request_fn(q, bio); >> > > > > >> > > > > blk_queue_exit(q); >> > > > > + /* sort new bios into those for a lower level >> > > > > + * and those for the same level >> > > > > + */ >> > > > > + bio_list_init(&lower); >> > > > > + bio_list_init(&same); >> > > > > + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL) >> > > > > + if (q == bdev_get_queue(bio->bi_bdev)) >> > > > > + bio_list_add(&same, bio); >> > > > > + else >> > > > > + bio_list_add(&lower, bio); >> > > > > + /* now assemble so we handle the lowest level first */ >> > > > > + bio_list_merge(&bio_list_on_stack, &lower); >> > > > > + bio_list_merge(&bio_list_on_stack, &same); >> > > > > + bio_list_merge(&bio_list_on_stack, &hold); >> > > > > >> > > > > bio = bio_list_pop(current->bio_list); >> > > > > } else { >> > > > > -- >> > > > > 2.7.4 >> > > >> > > Mikulas, would you be willing to try the below patch with the >> > > dm-snapshot deadlock scenario and report back on whether it fixes that? >> > > >> > > Patch below looks to be the same as here: >> > > https://marc.info/?l=linux-raid&m=148232453107685&q=p3 >> > > >> > > Neil and/or others if that isn't the patch that should be tested please >> > > provide a pointer to the latest. >> > > >> > > Thanks, >> > > Mike >> > >> > The bad news is that this doesn't fix the snapshot deadlock. >> > >> > I created a test program for the snapshot deadlock bug (it was originally >> > created years ago to test for a different bug, so it contains some cruft). >> > You also need to insert "if (ci->sector_count) msleep(100);" to the end of >> > __split_and_process_non_flush to make the kernel sleep when splitting the >> > bio. >> > >> > And with the above above patch, the snapshot deadlock bug still happens. > > That is really unfortunate. Would be useful to dig in and understand > why. Because ordering of the IO in generic_make_request() really should > take care of it. I *think* you might be able to resolve this by changing __split_and_process_bio() to only ever perform a single split. No looping. i.e. if the bio is too big to handle directly, then split off the front to a new bio, which you bio_chain to the original. The original then has bio_advance() called to stop over the front, then generic_make_request() so it is queued. Then the code proceeds to __clone_and_map_data_bio() on the front that got split off. When that completes it *doesn't* loop round, but returns into generic_make_request() which does the looping and makes sure to handle the lowest-level bio next. something vaguely like this: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5664f3..06ee0960e415 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci) len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count); + if (len < ci->sector_count) { + struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set); + bio_chain(split, bio); + generic_make_request(bio); + bio = split; + ci->sector_count = len; + } + r = __clone_and_map_data_bio(ci, ti, ci->sector, &len); if (r < 0) return r; though I haven't tested, and the change (if it works) should probably be more fully integrated into surrounding code. You probably don't want to use "fs_bio_set" either - a target-local pool would be best. NeilBrown
Attachment:
signature.asc
Description: PGP signature