Commit 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") removes cloned bio when dm io splitting is needed. This way will make multiple dm io instance sharing same original bio, and it works fine if IOs are completed successfully. But regression may be caused if BLK_STS_DM_REQUEUE is returned from either one of cloned io. If case of BLK_STS_DM_REQUEUE from one cloned io, only the mapped part of original bio for the current exact dm io needs to be re-submitted. However, since the original bio is shared among all dm io instances, actually the original bio only represents the last dm io instance, so requeue can't work as expected. Also when more than one dm io is requeued, the same original bio is requeued from all dm io's completion handler, then race is caused. Fix the issue by still allocating one bio for completing io only, then io accounting can reply on ->orig_bio. Based on one earlier version from Mike. In theory, we can delay the bio clone when BLK_STS_DM_REQUEUE happens, but that approach is a bit complicated: 1) bio clone needs to be done in task context; 2) one block interface for unwinding bio is required. Cc: Benjamin Marzinski <bmarzins@xxxxxxxxxx> Fixes: 7dd76d1feec7 ("dm: improve bio splitting and associated IO accounting") Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> --- drivers/md/dm-core.h | 2 +- drivers/md/dm.c | 10 ++++++---- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h index 54c0473a51dd..32f461c624c6 100644 --- a/drivers/md/dm-core.h +++ b/drivers/md/dm-core.h @@ -273,7 +273,7 @@ struct dm_io { struct mapped_device *md; /* The three fields represent mapped part of original bio */ - struct bio *orig_bio; + struct bio *orig_bio, *bak_bio; unsigned int sector_offset; /* offset to end of orig_bio */ unsigned int sectors; diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 9ede55278eec..85d8f2f1c9c8 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -594,6 +594,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) atomic_set(&io->io_count, 2); this_cpu_inc(*md->pending_io); io->orig_bio = bio; + io->bak_bio = NULL; io->md = md; spin_lock_init(&io->lock); io->start_time = jiffies; @@ -887,7 +888,7 @@ static void dm_io_complete(struct dm_io *io) { blk_status_t io_error; struct mapped_device *md = io->md; - struct bio *bio = io->orig_bio; + struct bio *bio = io->bak_bio ? io->bak_bio : io->orig_bio; if (io->status == BLK_STS_DM_REQUEUE) { unsigned long flags; @@ -1693,9 +1694,10 @@ static void dm_split_and_process_bio(struct mapped_device *md, * Remainder must be passed to submit_bio_noacct() so it gets handled * *after* bios already submitted have been completely processed. */ - bio_trim(bio, io->sectors, ci.sector_count); - trace_block_split(bio, bio->bi_iter.bi_sector); - bio_inc_remaining(bio); + io->bak_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count, + GFP_NOIO, &md->queue->bio_split); + bio_chain(io->bak_bio, bio); + trace_block_split(io->bak_bio, bio->bi_iter.bi_sector); submit_bio_noacct(bio); out: /* -- 2.31.1 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel