On 10/13/21 11:20 PM, Christoph Hellwig wrote: > On Tue, Oct 12, 2021 at 12:12:39PM -0600, Jens Axboe wrote: >> It's expensive to browse the whole plug list for merge opportunities at >> the IOPS rates that modern storage can do. For sequential IO, the one-hit >> cached merge should suffice on fast drives, and for rotational storage the >> IO scheduler will do a more exhaustive lookup based merge anyway. > > I don't really want to argue, but maybe some actual measurements to > support the above claims would be useful in the commit log? Sure, I do need to pad the commit messages a bit. One example - running an IOPS bound worload caps out at ~5.6M IOPS and if you check the profile this is top of the list: Overhead Command Shared Object Symbol + 20.89% io_uring [kernel.vmlinux] [k] blk_attempt_plug_merge + 4.98% io_uring [kernel.vmlinux] [k] io_submit_sqes + 4.78% io_uring [kernel.vmlinux] [k] blkdev_direct_IO + 4.61% io_uring [kernel.vmlinux] [k] blk_mq_submit_bio + 3.67% io_uring [nvme] [k] nvme_queue_rq Disabling merging or applying the patch, we run at ~7.4M IOPS instead. That's about a 33% improvement from killing a silly merge loop that isn't even marked as an expensive merge. I'll rework the patch, we can probably retain it is a last-insert kind of merge point. But browsing the whole plug list for a merge candidate is stupidity, regardless of the class of workload and storage. -- Jens Axboe