On 4/12/22 09:09, Ming Lei wrote: > On Tue, Apr 12, 2022 at 08:33:04AM +0900, Damien Le Moal wrote: >> On 4/11/22 23:18, Ming Lei wrote: >>>>>> This fixes the issue: >>>>>> >>>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >>>>>> index 3c5fad7c4ee6..3dd6735450c5 100644 >>>>>> --- a/drivers/md/dm.c >>>>>> +++ b/drivers/md/dm.c >>>>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device >>>>>> *md, struct bio *bio) >>>>>> io->status = 0; >>>>>> atomic_set(&io->io_count, 1); >>>>>> this_cpu_inc(*md->pending_io); >>>>>> - io->orig_bio = NULL; >>>>>> + io->orig_bio = bio; >>>>>> io->md = md; >>>>>> io->map_task = current; >>>>>> spin_lock_init(&io->lock); >>>>>> >>>>>> Otherwise, the dm-zone.c code sees a NULL orig_bio. >>>>>> However, this change may be messing up the bio accounting. Need to check that. >>>>> >>>>> Looks it is one recent regression since: >>>>> >>>>> commit 0fbb4d93b38b ("dm: add dm_submit_bio_remap interface") >>>> >>>> Yep, saw that. Problem is, I really do not understand that change setting >>>> io->orig_bio *after* __map_bio() is called. It seems that the accounting >>>> is done on each fragment of the orig_bio instead of once for the entire >>>> BIO... So my "fix" above seems wrong. Apart from passing along orig_bio as >>>> an argument to __map_bio() from __split_and_process_bio(), I do not think >>>> my change is correct. Thoughts ? >>> >>> Frankly speaking, both changing ->orig_bio after split and setting ->orig_bio >>> after ->map() looks ugly & tricky, and the following change should avoid the >>> issue, meantime simplify dm accounting a bit: >>> >>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c >>> index 3c5fad7c4ee6..f1fe83113608 100644 >>> --- a/drivers/md/dm.c >>> +++ b/drivers/md/dm.c >>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio) >>> io->status = 0; >>> atomic_set(&io->io_count, 1); >>> this_cpu_inc(*md->pending_io); >>> - io->orig_bio = NULL; >>> + io->orig_bio = bio; >>> io->md = md; >>> io->map_task = current; >>> spin_lock_init(&io->lock); >>> @@ -1223,19 +1223,11 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone) >>> * Account io->origin_bio to DM dev on behalf of target >>> * that took ownership of IO with DM_MAPIO_SUBMITTED. >>> */ >>> - if (io->map_task == current) { >>> + if (io->map_task == current) >>> /* Still in target's map function */ >>> dm_io_set_flag(io, DM_IO_START_ACCT); >>> - } else { >>> - /* >>> - * Called by another thread, managed by DM target, >>> - * wait for dm_split_and_process_bio() to store >>> - * io->orig_bio >>> - */ >>> - while (unlikely(!smp_load_acquire(&io->orig_bio))) >>> - msleep(1); >>> + else >> >> Curly brackets around the else here. >> >>> dm_start_io_acct(io, clone); >>> - } >>> >>> __dm_submit_bio_remap(tgt_clone, disk_devt(io->md->disk), >>> tio->old_sector); >>> @@ -1562,7 +1554,7 @@ static void dm_split_and_process_bio(struct mapped_device *md, >>> struct dm_table *map, struct bio *bio) >>> { >>> struct clone_info ci; >>> - struct bio *orig_bio = NULL; >>> + struct bio *new_bio = NULL; >>> int error = 0; >>> >>> init_clone_info(&ci, md, map, bio); >>> @@ -1578,22 +1570,14 @@ static void dm_split_and_process_bio(struct mapped_device *md, >>> if (error || !ci.sector_count) >>> goto out; >>> >>> - /* >>> - * Remainder must be passed to submit_bio_noacct() so it gets handled >>> - * *after* bios already submitted have been completely processed. >>> - * We take a clone of the original to store in ci.io->orig_bio to be >>> - * used by dm_end_io_acct() and for dm_io_complete() to use for >>> - * completion handling. >>> - */ >> >> This comment should remain with some adjustment. > > Fine, just felt the approach is very straightforward. > >> >>> - orig_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count, >>> - GFP_NOIO, &md->queue->bio_split); >>> - bio_chain(orig_bio, bio); >>> - trace_block_split(orig_bio, bio->bi_iter.bi_sector); >>> - submit_bio_noacct(bio); >>> + new_bio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO, >>> + &md->queue->bio_split); >> >> Why not keep using bio_split() ? > > With bio_split(), 'bio' actually tracks the remainder, and the returned > 'orig_bio' tracks the part for current target io, so ->orig_bio has to > be updated in this way. > > With bio_alloc_clone() and bio_trim(), ->orig_bio needn't to be > changed, and assigning it in alloc_io() works perfectly. OK. Got it. >>> + bio_trim(new_bio, bio_sectors(bio) - ci.sector_count, ci.sector_count); >>> + bio_trim(bio, 0, bio_sectors(bio) - ci.sector_count); >>> + bio_chain(new_bio, bio); >>> + trace_block_split(new_bio, new_bio->bi_iter.bi_sector); >>> + submit_bio_noacct(new_bio); >>> out: >>> - if (!orig_bio) >>> - orig_bio = bio; >>> - smp_store_release(&ci.io->orig_bio, orig_bio); >>> if (dm_io_flagged(ci.io, DM_IO_START_ACCT)) >>> dm_start_io_acct(ci.io, NULL); >> >> I tested this and it works. Need to check the accounting though. >> And I agree this is a lot cleaner :) > > BTW, the cloned bio for split is just for accounting purpose, if > ->bi_iter.bi_sector & ->bi_iter.bi_size can be stored in 'struct dm_io', > the cloned bio can be avoided, but code may become not readable as > before. The BIO op would be needed too for remapping zone append to regular writes when zone append emulation is enabled. That is actually why dm_zone_map_bio() needs the original BIO, to have the unmodified BIO op since the clone op may not be the same. So I think this fix+cleanup as is is good for now. Will you send a proper patch ? > > > Thanks, > Ming > -- Damien Le Moal Western Digital Research -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel