On Thu, May 16, 2024 at 10:49:55PM +0200, Mikulas Patocka wrote: > Device mapper sends flush bios to all the targets and the targets send it > to the underlying device. That may be inefficient, for example if a table > contains 10 linear targets pointing to the same physical device, then > device mapper would send 10 flush bios to that device - despite the fact > that only one bio would be sufficient. > > This commit optimizes the flush behavior. It introduces a per-target > variable flush_pass_around - it is set when the target supports flush > optimization - currently, the dm-linear and dm-stripe targets support it. > When all the targets in a table have flush_pass_around, flush_pass_around > on the table is set. __send_empty_flush tests if the table has > flush_pass_around - and if it has, no flush bios are sent to the targets > and the list dm_table->devices is iterated and the flush bios are sent to > each member of the list. What does "pass around" mean? Seems like an awkward name for this. (Naming can be hard, I don't have better suggestions at the moment.) > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > Reported-by: Yang Yang <yang.yang@xxxxxxxx> > > --- > drivers/md/dm-core.h | 4 ++- > drivers/md/dm-linear.c | 1 > drivers/md/dm-stripe.c | 1 > drivers/md/dm-table.c | 4 +++ > drivers/md/dm.c | 47 +++++++++++++++++++++++++++++------------- > include/linux/device-mapper.h | 5 ++++ > 6 files changed, 47 insertions(+), 15 deletions(-) > > Index: linux-2.6/drivers/md/dm-core.h > =================================================================== > --- linux-2.6.orig/drivers/md/dm-core.h 2024-05-15 16:56:49.000000000 +0200 > +++ linux-2.6/drivers/md/dm-core.h 2024-05-15 16:56:49.000000000 +0200 > @@ -206,7 +206,9 @@ struct dm_table { > > bool integrity_supported:1; > bool singleton:1; > - unsigned integrity_added:1; > + bool integrity_added:1; > + /* set if all the targets in the table have "flush_pass_around" set */ > + bool flush_pass_around:1; > > /* > * Indicates the rw permissions for the new logical device. This > Index: linux-2.6/drivers/md/dm-linear.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm-linear.c 2024-05-15 16:56:49.000000000 +0200 > +++ linux-2.6/drivers/md/dm-linear.c 2024-05-15 16:56:49.000000000 +0200 > @@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target * > ti->num_discard_bios = 1; > ti->num_secure_erase_bios = 1; > ti->num_write_zeroes_bios = 1; > + ti->flush_pass_around = true; > ti->private = lc; > return 0; > > Index: linux-2.6/drivers/md/dm-stripe.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm-stripe.c 2024-05-15 16:56:49.000000000 +0200 > +++ linux-2.6/drivers/md/dm-stripe.c 2024-05-15 16:56:49.000000000 +0200 > @@ -157,6 +157,7 @@ static int stripe_ctr(struct dm_target * > ti->num_discard_bios = stripes; > ti->num_secure_erase_bios = stripes; > ti->num_write_zeroes_bios = stripes; > + ti->flush_pass_around = true; > > sc->chunk_size = chunk_size; > if (chunk_size & (chunk_size - 1)) > Index: linux-2.6/drivers/md/dm-table.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm-table.c 2024-05-15 16:56:49.000000000 +0200 > +++ linux-2.6/drivers/md/dm-table.c 2024-05-15 16:56:49.000000000 +0200 > @@ -160,6 +160,7 @@ int dm_table_create(struct dm_table **re > t->type = DM_TYPE_NONE; > t->mode = mode; > t->md = md; > + t->flush_pass_around = 1; > *result = t; > return 0; > } Should be: t->flush_pass_around = true; > @@ -738,6 +739,9 @@ int dm_table_add_target(struct dm_table > if (ti->limit_swap_bios && !static_key_enabled(&swap_bios_enabled.key)) > static_branch_enable(&swap_bios_enabled); > > + if (!ti->flush_pass_around) > + t->flush_pass_around = false; > + > return 0; > > bad: > Index: linux-2.6/include/linux/device-mapper.h > =================================================================== > --- linux-2.6.orig/include/linux/device-mapper.h 2024-05-15 16:56:49.000000000 +0200 > +++ linux-2.6/include/linux/device-mapper.h 2024-05-15 16:56:49.000000000 +0200 > @@ -397,6 +397,11 @@ struct dm_target { > * bio_set_dev(). NOTE: ideally a target should _not_ need this. > */ > bool needs_bio_set_dev:1; > + > + /* > + * Set if the target supports flush optimization > + */ > + bool flush_pass_around:1; > }; How does a developer _know_ if a target can set this flag? Please elaborate on the requirements in this code comment. > > void *dm_per_bio_data(struct bio *bio, size_t data_size); > Index: linux-2.6/drivers/md/dm.c > =================================================================== > --- linux-2.6.orig/drivers/md/dm.c 2024-05-15 16:56:49.000000000 +0200 > +++ linux-2.6/drivers/md/dm.c 2024-05-16 20:06:32.000000000 +0200 > @@ -645,7 +645,7 @@ static struct bio *alloc_tio(struct clon > > /* Set default bdev, but target must bio_set_dev() before issuing IO */ > clone->bi_bdev = md->disk->part0; > - if (unlikely(ti->needs_bio_set_dev)) > + if (likely(ti != NULL) && unlikely(ti->needs_bio_set_dev)) > bio_set_dev(clone, md->disk->part0); > > if (len) { > @@ -1107,7 +1107,7 @@ static void clone_endio(struct bio *bio) > blk_status_t error = bio->bi_status; > struct dm_target_io *tio = clone_to_tio(bio); > struct dm_target *ti = tio->ti; > - dm_endio_fn endio = ti->type->end_io; > + dm_endio_fn endio = likely(ti != NULL) ? ti->type->end_io : NULL; > struct dm_io *io = tio->io; > struct mapped_device *md = io->md; > > @@ -1154,7 +1154,7 @@ static void clone_endio(struct bio *bio) > } > > if (static_branch_unlikely(&swap_bios_enabled) && > - unlikely(swap_bios_limit(ti, bio))) > + likely(ti != NULL) && unlikely(swap_bios_limit(ti, bio))) > up(&md->swap_bios_semaphore); > > free_tio(bio); What is it about this commit that makes it important to verify ti isn't NULL in the above 3 hunks? Should these NULL checks be factored out as a separate fix? Or can these hunks be dropped? > @@ -1566,17 +1566,36 @@ static void __send_empty_flush(struct cl > ci->sector_count = 0; > ci->io->tio.clone.bi_iter.bi_size = 0; > > - for (unsigned int i = 0; i < t->num_targets; i++) { > - unsigned int bios; > - struct dm_target *ti = dm_table_get_target(t, i); > - > - if (unlikely(ti->num_flush_bios == 0)) > - continue; > - > - atomic_add(ti->num_flush_bios, &ci->io->io_count); > - bios = __send_duplicate_bios(ci, ti, ti->num_flush_bios, > - NULL, GFP_NOWAIT); > - atomic_sub(ti->num_flush_bios - bios, &ci->io->io_count); > + if (!t->flush_pass_around) { > + for (unsigned int i = 0; i < t->num_targets; i++) { > + unsigned int bios; > + struct dm_target *ti = dm_table_get_target(t, i); > + > + if (unlikely(ti->num_flush_bios == 0)) > + continue; > + > + atomic_add(ti->num_flush_bios, &ci->io->io_count); > + bios = __send_duplicate_bios(ci, ti, ti->num_flush_bios, > + NULL, GFP_NOWAIT); > + atomic_sub(ti->num_flush_bios - bios, &ci->io->io_count); > + } > + } else { > + /* > + * Note that there's no need to grab t->devices_lock here > + * because the targets that support flush pass-around don't > + * modify the list of devices. > + */ > + struct list_head *devices = dm_table_get_devices(t); > + unsigned int len = 0; > + struct dm_dev_internal *dd; > + list_for_each_entry(dd, devices, list) { > + struct bio *clone; > + clone = alloc_tio(ci, NULL, 0, &len, GFP_NOIO); > + atomic_add(1, &ci->io->io_count); > + bio_set_dev(clone, dd->dm_dev->bdev); > + clone->bi_end_io = clone_endio; > + dm_submit_bio_remap(clone, NULL); > + } > } > > /* > > Still missing what "pass-around" is meant to convey given that you aren't passing around the same flush... you're cloning a new flush and issuing one per device. Probably worth explaining that's what you mean by "flush_pass_around" (both in commit header and elaborate in code)? Also, you're issuing a flush to _all_ devices in a table. Not just the data devices. .iterate_devices returns only the data devices. If/when there is a need to extend this feature to targets that have metadata devices (e.g. dm-thin, cache, etc): would it make sense to filter out non-data devices (by stepping through each target in the table and using iterate_devices)? Mike