On Tue, Apr 12, 2022 at 09:28:46AM +0900, Damien Le Moal wrote: > On 4/12/22 09:09, Ming Lei wrote: > > On Tue, Apr 12, 2022 at 08:33:04AM +0900, Damien Le Moal wrote: > >> On 4/11/22 23:18, Ming Lei wrote: > >>>>>> This fixes the issue: > >>>>>> > >>>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c > >>>>>> index 3c5fad7c4ee6..3dd6735450c5 100644 > >>>>>> --- a/drivers/md/dm.c > >>>>>> +++ b/drivers/md/dm.c > >>>>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device > >>>>>> *md, struct bio *bio) > >>>>>> io->status = 0; > >>>>>> atomic_set(&io->io_count, 1); > >>>>>> this_cpu_inc(*md->pending_io); > >>>>>> - io->orig_bio = NULL; > >>>>>> + io->orig_bio = bio; > >>>>>> io->md = md; > >>>>>> io->map_task = current; > >>>>>> spin_lock_init(&io->lock); > >>>>>> > >>>>>> Otherwise, the dm-zone.c code sees a NULL orig_bio. > >>>>>> However, this change may be messing up the bio accounting. Need to check that. > >>>>> > >>>>> Looks it is one recent regression since: > >>>>> > >>>>> commit 0fbb4d93b38b ("dm: add dm_submit_bio_remap interface") > >>>> > >>>> Yep, saw that. Problem is, I really do not understand that change setting > >>>> io->orig_bio *after* __map_bio() is called. It seems that the accounting > >>>> is done on each fragment of the orig_bio instead of once for the entire > >>>> BIO... So my "fix" above seems wrong. Apart from passing along orig_bio as > >>>> an argument to __map_bio() from __split_and_process_bio(), I do not think > >>>> my change is correct. Thoughts ? > >>> > >>> Frankly speaking, both changing ->orig_bio after split and setting ->orig_bio > >>> after ->map() looks ugly & tricky, and the following change should avoid the > >>> issue, meantime simplify dm accounting a bit: > >>> > >>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c > >>> index 3c5fad7c4ee6..f1fe83113608 100644 > >>> --- a/drivers/md/dm.c > >>> +++ b/drivers/md/dm.c > >>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) > >>> io->status = 0; > >>> atomic_set(&io->io_count, 1); > >>> this_cpu_inc(*md->pending_io); > >>> - io->orig_bio = NULL; > >>> + io->orig_bio = bio; > >>> io->md = md; > >>> io->map_task = current; > >>> spin_lock_init(&io->lock); > >>> @@ -1223,19 +1223,11 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone) > >>> * Account io->origin_bio to DM dev on behalf of target > >>> * that took ownership of IO with DM_MAPIO_SUBMITTED. > >>> */ > >>> - if (io->map_task == current) { > >>> + if (io->map_task == current) > >>> /* Still in target's map function */ > >>> dm_io_set_flag(io, DM_IO_START_ACCT); > >>> - } else { > >>> - /* > >>> - * Called by another thread, managed by DM target, > >>> - * wait for dm_split_and_process_bio() to store > >>> - * io->orig_bio > >>> - */ > >>> - while (unlikely(!smp_load_acquire(&io->orig_bio))) > >>> - msleep(1); > >>> + else > >> > >> Curly brackets around the else here. > >> > >>> dm_start_io_acct(io, clone); > >>> - } > >>> > >>> __dm_submit_bio_remap(tgt_clone, disk_devt(io->md->disk), > >>> tio->old_sector); > >>> @@ -1562,7 +1554,7 @@ static void dm_split_and_process_bio(struct mapped_device *md, > >>> struct dm_table *map, struct bio *bio) > >>> { > >>> struct clone_info ci; > >>> - struct bio *orig_bio = NULL; > >>> + struct bio *new_bio = NULL; > >>> int error = 0; > >>> > >>> init_clone_info(&ci, md, map, bio); > >>> @@ -1578,22 +1570,14 @@ static void dm_split_and_process_bio(struct mapped_device *md, > >>> if (error || !ci.sector_count) > >>> goto out; > >>> > >>> - /* > >>> - * Remainder must be passed to submit_bio_noacct() so it gets handled > >>> - * *after* bios already submitted have been completely processed. > >>> - * We take a clone of the original to store in ci.io->orig_bio to be > >>> - * used by dm_end_io_acct() and for dm_io_complete() to use for > >>> - * completion handling. > >>> - */ > >> > >> This comment should remain with some adjustment. > > > > Fine, just felt the approach is very straightforward. > > > >> > >>> - orig_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count, > >>> - GFP_NOIO, &md->queue->bio_split); > >>> - bio_chain(orig_bio, bio); > >>> - trace_block_split(orig_bio, bio->bi_iter.bi_sector); > >>> - submit_bio_noacct(bio); > >>> + new_bio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, > >>> + &md->queue->bio_split); > >> > >> Why not keep using bio_split() ? > > > > With bio_split(), 'bio' actually tracks the remainder, and the returned > > 'orig_bio' tracks the part for current target io, so ->orig_bio has to > > be updated in this way. > > > > With bio_alloc_clone() and bio_trim(), ->orig_bio needn't to be > > changed, and assigning it in alloc_io() works perfectly. > > OK. Got it. > > >>> + bio_trim(new_bio, bio_sectors(bio) - ci.sector_count, ci.sector_count); > >>> + bio_trim(bio, 0, bio_sectors(bio) - ci.sector_count); > >>> + bio_chain(new_bio, bio); > >>> + trace_block_split(new_bio, new_bio->bi_iter.bi_sector); > >>> + submit_bio_noacct(new_bio); > >>> out: > >>> - if (!orig_bio) > >>> - orig_bio = bio; > >>> - smp_store_release(&ci.io->orig_bio, orig_bio); > >>> if (dm_io_flagged(ci.io, DM_IO_START_ACCT)) > >>> dm_start_io_acct(ci.io, NULL); > >> > >> I tested this and it works. Need to check the accounting though. > >> And I agree this is a lot cleaner :) > > > > BTW, the cloned bio for split is just for accounting purpose, if > > ->bi_iter.bi_sector & ->bi_iter.bi_size can be stored in 'struct dm_io', > > the cloned bio can be avoided, but code may become not readable as > > before. > > The BIO op would be needed too for remapping zone append to regular writes > when zone append emulation is enabled. That is actually why > dm_zone_map_bio() needs the original BIO, to have the unmodified BIO op > since the clone op may not be the same. clone inherits the original bio's op. I meant to store the real bio from FS as ->orig_bio always in alloc_io(), and simply trim it in case of split & re-submission, meantime store orig_bio's ->bi_sector & ->size into 'dm_io' for account purpose. But it looks a bit complicated and messy. Wrt. dm zone, I'd suggest to double check anywhere orig bio is used, since only part of orig bio may be mapped in case of splitting, which is actually post-split. If bio split won't happen for dm-zone, your patch is fine. But I guess it isn't true for dm-zone. > > So I think this fix+cleanup as is is good for now. Will you send a proper > patch ? Not yet, the fix+cleanup patch actually breaks recent dm io polling, and I don't figure out one solution yet. Thanks, Ming -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel