On Mon, Dec 25, 2017 at 12:11:47PM +0200, Sagi Grimberg wrote: > > This is a performance optimization that allows the hardware to work on > > a command in parallel with the kernel's stats and timeout tracking. > > Curious to know what does this buy us? blk_mq_start_request doesn't do anything to make the command ready to submit to hardware, so this is pure software overhead, somewhere between 200-300 nanoseconds as far as I can measure (YMMV). We can post the command to hardware before taking on this software overhead so the hardware gets to fetch the command that much sooner. One thing to remember is we need to ensure the driver can't complete the request before starting it. The driver's exisitng locking does ensure that with this patch, so it's just something to keep in mind if anything should ever change in the future.