From: Michael Kelley (LINUX) Sent: Thursday, June 23, 2022 8:28 AM > > In using fio for performance measurements of sequential I/O against a > SCSI disk, I see a big difference in reads vs. writes. Reads get good > merging, but writes get no merging. Here are the fio command lines: > > fio --bs=4k --ioengine=libaio --iodepth=64 --filename=/dev/sdc --direct=1 --numjobs=8 > --rw=read --name=readmerge > > fio --bs=4k --ioengine=libaio --iodepth=64 --filename=/dev/sdc --direct=1 --numjobs=8 > --rw=write --name=writemerge > > /sys/block/sdc/queue/scheduler is set to [none]. Linux kernel is 5.18.5. > > The code difference appears to be in blkdev_read_iter() vs. > blkdev_write_iter(). The latter has blk_start_plug()/blk_finish_plug() > while the former does not. As a result, blk_mq_submit_bio() puts > write requests on the pluglist, which is flushed almost immediately. > blk_mq_submit_bio() sends the read requests through > blk_mq_try_issue_directly(), and they are later merged in a blk-mq > software queue. > > For writes, the pluglist flush path is as follows: > > blk_finish_plug() > __blk_flush_plug() > blk_mq_flush_plug_list() > blk_mq_plug_issue_direct() > blk_mq_request_issue_directly() > __blk_mq_try_issue_directly() > > The last function is called with "bypass_insert" set to true, so if the > request must wait for budget, the request doesn't go on a blk-mq > software queue like reads do, and no merging happens. > > I don't know the blk-mq layer well enough to know what's > supposed to be happening. Is it intentional to not do write merges > in this case? Or did something get broken, and it wasn't noticed? > Gentle ping. Anyone have knowledge of whether write merging *should* be happening in this case? Michael