__send_empty_flush() sends empty flush bios to every target in the dm_table. However, if the num_targets exceeds the number of block devices in the dm_table's device list, it could lead to multiple invocations of __send_duplicate_bios() for the same block device. Typically, a single thread sending numerous empty flush bios to one block device is redundant, as these bios are likely to be merged by the flush state machine. In scenarios where num_targets significantly outweighs the number of block devices, such behavior may result in a noteworthy decrease in performance. This issue can be reproduced using this command line: for i in {0..1023}; do echo $((8000*$i)) 8000 linear /dev/sda2 $((16384*$i)) done | dmsetup create example With this fix, a random write with fsync workload executed with the following fio command: fio --group_reporting --name=benchmark --filename=/dev/mapper/example \ --ioengine=sync --invalidate=1 --numjobs=16 --rw=randwrite \ --blocksize=4k --size=2G --time_based --runtime=30 --fdatasync=1 results in an increase from 857 KB/s to 30.8 MB/s of the write throughput (3580% increase). Signed-off-by: Yang Yang <yang.yang@xxxxxxxx> --- drivers/md/dm-linear.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c index 2d3e186ca87e..3e1a33b4d289 100644 --- a/drivers/md/dm-linear.c +++ b/drivers/md/dm-linear.c @@ -62,6 +62,7 @@ static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv) ti->num_discard_bios = 1; ti->num_secure_erase_bios = 1; ti->num_write_zeroes_bios = 1; + ti->flush_pass_around = 1; ti->private = lc; return 0; -- 2.34.1