On Wed, Apr 13, 2022 at 10:25:45PM -0400, Mike Snitzer wrote: > On Wed, Apr 13 2022 at 8:36P -0400, > Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > On Wed, Apr 13, 2022 at 01:58:54PM -0400, Mike Snitzer wrote: > > > > > > The bigger issue with this patch is that you've caused > > > dm_submit_bio_remap() to go back to accounting the entire original bio > > > before any split occurs. That is a problem because you'll end up > > > accounting that bio for every split, so in split heavy workloads the > > > IO accounting won't reflect when the IO is actually issued and we'll > > > regress back to having very inaccurate and incorrect IO accounting for > > > dm_submit_bio_remap() heavy targets (e.g. dm-crypt). > > > > Good catch, but we know the length of mapped part in original bio before > > calling __map_bio(), so io->sectors/io->offset_sector can be setup here, > > something like the following delta change should address it: > > > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > > index db23efd6bbf6..06b554f3104b 100644 > > --- a/drivers/md/dm.c > > +++ b/drivers/md/dm.c > > @@ -1558,6 +1558,13 @@ static int __split_and_process_bio(struct clone_info *ci) > > > > len = min_t(sector_t, max_io_len(ti, ci->sector), ci->sector_count); > > clone = alloc_tio(ci, ti, 0, &len, GFP_NOIO); > > + > > + if (ci->sector_count > len) { > > + /* setup the mapped part for accounting */ > > + dm_io_set_flag(ci->io, DM_IO_SPLITTED); > > + ci->io->sectors = len; > > + ci->io->sector_offset = bio_end_sector(ci->bio) - ci->sector; > > + } > > __map_bio(clone); > > > > ci->sector += len; > > @@ -1603,11 +1610,6 @@ static void dm_split_and_process_bio(struct mapped_device *md, > > if (error || !ci.sector_count) > > goto out; > > > > - /* setup the mapped part for accounting */ > > - dm_io_set_flag(ci.io, DM_IO_SPLITTED); > > - ci.io->sectors = bio_sectors(bio) - ci.sector_count; > > - ci.io->sector_offset = bio_end_sector(bio) - bio->bi_iter.bi_sector; > > - > > bio_trim(bio, ci.io->sectors, ci.sector_count); > > trace_block_split(bio, bio->bi_iter.bi_sector); > > bio_inc_remaining(bio); > > > > -- > > Ming > > > > Unfortunately we do need splitting after __map_bio() because a dm > target's ->map can use dm_accept_partial_bio() to further reduce a > bio's mapped part. > > But I think dm_accept_partial_bio() could be trained to update > tio->io->sectors? ->orig_bio is just for serving io accounting, but ->orig_bio isn't passed to dm_accept_partial_bio(), and not gets updated after dm_accept_partial_bio() is called. If that is one issue, it must be one existed issue in dm io accounting since ->orig_bio isn't updated when dm_accept_partial_bio() is called. So do we have to update it? > > dm_accept_partial_bio() has been around for a long time, it keeps > growing BUG_ONs that are actually helpful to narrow its use to "normal > IO", so it should be OK. > > Running 'make check' in a built cryptsetup source tree should be a > good test for DM target interface functionality. Care to share the test tree? > > But there aren't automated tests for IO accounting correctness yet. I did verify io accounting by running dm-thin with blk-throttle, and the observed throughput is same with expected setting. Running both small bs and large bs, so non-split and split code path are covered. Maybe you can add this kind of test into dm io accounting automated test. Thanks, Ming