On Thu, 14 Feb 2019, Mike Snitzer wrote: > On Thu, Feb 14 2019 at 10:00am -0500, > Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > This patch improves performance of dm-linear and dm-striped targets. > > Device mapper copies the whole bio and passes it to the lower layer. This > > copying may be avoided in special cases. > > > > This patch changes the logic so that instead of copying the bio we > > allocate a structure dm_noclone (it has only 4 entries), save the values > > bi_end_io and bi_private in it, overwrite these values in the bio and pass > > the bio to the lower block device. > > > > When the bio is finished, the function noclone_endio restores te values > > bi_end_io and bi_private and passes the bio to the original bi_end_io > > function. > > > > This optimization can only be done by dm-linear and dm-striped targets, > > the target can op-in by setting ti->no_clone = true. > > > > Performance improvement: > > > > # modprobe brd rd_size=1048576 > > # dd if=/dev/zero of=/dev/ram0 bs=1M oflag=direct > > # dmsetup create lin --table "0 2097152 linear /dev/ram0 0" > > # fio --ioengine=psync --iodepth=1 --rw=read --bs=512 --direct=1 --numjobs=12 --time_based --runtime=10 --group_reporting --name=/dev/mapper/lin > > > > x86-64, 2x six-core > > /dev/ram0 2449MiB/s > > /dev/mapper/lin 5.0-rc without optimization 1970MiB/s > > /dev/mapper/lin 5.0-rc with optimization 2238MiB/s > > > > arm64, quad core: > > /dev/ram0 457MiB/s > > /dev/mapper/lin 5.0-rc without optimization 325MiB/s > > /dev/mapper/lin 5.0-rc with optimization 364MiB/s > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > Nice performance improvement. But each device should have its own > mempool for dm_noclone + front padding. So it should be wired into > dm_alloc_md_mempools(). We don't need to use mempools - if the slab allocation fails, we fall back to the cloning path that has mempools. > It is fine if you don't actually deal with supporting per-bio-data in > this patch, but a follow-on patch to add support for noclone-based > per-bio-data shouldn't be expected to refactor the location of the > mempool allocation (module vs per-device granularity). > > Mike I tried to use per-bio-data and other features - and it makes the structure dm_noclone and function noclone_endio grow: #define DM_NOCLONE_MAGIC 9693664 struct dm_noclone { struct mapped_device *md; struct dm_target *ti; struct bio *bio; struct bvec_iter orig_bi_iter; bio_end_io_t *orig_bi_end_io; void *orig_bi_private; unsigned long start_time; /* ... per-bio data ... */ /* DM_NOCLONE_MAGIC */ }; And this growth degrades performance on linear target - from 2238MiB/s to 2145MiB/s. Mikulas -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel