On 4/11/22 23:18, Ming Lei wrote: >>>> This fixes the issue: >>>> >>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >>>> index 3c5fad7c4ee6..3dd6735450c5 100644 >>>> --- a/drivers/md/dm.c >>>> +++ b/drivers/md/dm.c >>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device >>>> *md, struct bio *bio) >>>> io->status = 0; >>>> atomic_set(&io->io_count, 1); >>>> this_cpu_inc(*md->pending_io); >>>> - io->orig_bio = NULL; >>>> + io->orig_bio = bio; >>>> io->md = md; >>>> io->map_task = current; >>>> spin_lock_init(&io->lock); >>>> >>>> Otherwise, the dm-zone.c code sees a NULL orig_bio. >>>> However, this change may be messing up the bio accounting. Need to check that. >>> >>> Looks it is one recent regression since: >>> >>> commit 0fbb4d93b38b ("dm: add dm_submit_bio_remap interface") >> >> Yep, saw that. Problem is, I really do not understand that change setting >> io->orig_bio *after* __map_bio() is called. It seems that the accounting >> is done on each fragment of the orig_bio instead of once for the entire >> BIO... So my "fix" above seems wrong. Apart from passing along orig_bio as >> an argument to __map_bio() from __split_and_process_bio(), I do not think >> my change is correct. Thoughts ? > > Frankly speaking, both changing ->orig_bio after split and setting ->orig_bio > after ->map() looks ugly & tricky, and the following change should avoid the > issue, meantime simplify dm accounting a bit: > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 3c5fad7c4ee6..f1fe83113608 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) > io->status = 0; > atomic_set(&io->io_count, 1); > this_cpu_inc(*md->pending_io); > - io->orig_bio = NULL; > + io->orig_bio = bio; > io->md = md; > io->map_task = current; > spin_lock_init(&io->lock); > @@ -1223,19 +1223,11 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone) > * Account io->origin_bio to DM dev on behalf of target > * that took ownership of IO with DM_MAPIO_SUBMITTED. > */ > - if (io->map_task == current) { > + if (io->map_task == current) > /* Still in target's map function */ > dm_io_set_flag(io, DM_IO_START_ACCT); > - } else { > - /* > - * Called by another thread, managed by DM target, > - * wait for dm_split_and_process_bio() to store > - * io->orig_bio > - */ > - while (unlikely(!smp_load_acquire(&io->orig_bio))) > - msleep(1); > + else Curly brackets around the else here. > dm_start_io_acct(io, clone); > - } > > __dm_submit_bio_remap(tgt_clone, disk_devt(io->md->disk), > tio->old_sector); > @@ -1562,7 +1554,7 @@ static void dm_split_and_process_bio(struct mapped_device *md, > struct dm_table *map, struct bio *bio) > { > struct clone_info ci; > - struct bio *orig_bio = NULL; > + struct bio *new_bio = NULL; > int error = 0; > > init_clone_info(&ci, md, map, bio); > @@ -1578,22 +1570,14 @@ static void dm_split_and_process_bio(struct mapped_device *md, > if (error || !ci.sector_count) > goto out; > > - /* > - * Remainder must be passed to submit_bio_noacct() so it gets handled > - * *after* bios already submitted have been completely processed. > - * We take a clone of the original to store in ci.io->orig_bio to be > - * used by dm_end_io_acct() and for dm_io_complete() to use for > - * completion handling. > - */ This comment should remain with some adjustment. > - orig_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count, > - GFP_NOIO, &md->queue->bio_split); > - bio_chain(orig_bio, bio); > - trace_block_split(orig_bio, bio->bi_iter.bi_sector); > - submit_bio_noacct(bio); > + new_bio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, > + &md->queue->bio_split); Why not keep using bio_split() ? > + bio_trim(new_bio, bio_sectors(bio) - ci.sector_count, ci.sector_count); > + bio_trim(bio, 0, bio_sectors(bio) - ci.sector_count); > + bio_chain(new_bio, bio); > + trace_block_split(new_bio, new_bio->bi_iter.bi_sector); > + submit_bio_noacct(new_bio); > out: > - if (!orig_bio) > - orig_bio = bio; > - smp_store_release(&ci.io->orig_bio, orig_bio); > if (dm_io_flagged(ci.io, DM_IO_START_ACCT)) > dm_start_io_acct(ci.io, NULL); I tested this and it works. Need to check the accounting though. And I agree this is a lot cleaner :) -- Damien Le Moal Western Digital Research -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel