On 3/8/22 7:43 PM, Ming Lei wrote: > On Tue, Mar 08, 2022 at 07:17:13PM -0600, Mike Christie wrote: >> On 3/8/22 6:53 PM, Ming Lei wrote: >>> On Mon, Mar 07, 2022 at 06:39:54PM -0600, Mike Christie wrote: >>>> The software iscsi driver's queuecommand can block and taking the extra >>>> hop from kblockd to its workqueue results in a performance hit. Allowing >>>> it to set BLK_MQ_F_BLOCKING and transmit from that context directly >>>> results in a 20-30% improvement in IOPs for workloads like: >>>> >>>> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio >>>> --iodepth=128 --numjobs=1 >>>> >>>> and for all write workloads. >>> >>> This single patch shouldn't make any difference for iscsi, so please >>> make it as last one if performance improvement data is provided >>> in commit log. >> >> Ok. >> >>> >>> Also is there performance effect for other worloads? such as multiple >>> jobs? iscsi is SQ hardware, so if driver is blocked in ->queuecommand() >>> via BLK_MQ_F_BLOCKING, other contexts can't submit IO to scsi ML any more. >> >> If you mean multiple jobs running on the same connection/session then >> they are all serialized now. A connection can only do 1 cmd at a time. >> There's a big mutex around it in the network layer, so multiple jobs >> just suck no matter what. > > I guess one block device can only bind to one isci connection, given the > 1 cmd per connection limit, so looks multiple jobs is fine. > >> >> If you mean multiple jobs from different connection/sessions, then the >> iscsi code with this patchset blocks only because the network layer >> takes a mutex for a short time. We configure it to not block for things >> like socket space, memory allocations, we do zero copy IO normally, etc >> so it's quick. >> >> We also can do up to workqueues max_active limit worth of calls so >> other things can normally send IO. We haven't found a need to increase >> it yet. > > I meant that hctx->run_work is required for blk-mq to dispatch IO, iscsi is > SQ HBA, so there is only single work_struct. If one context is blocked in > ->queue_rq or ->queuecommand, other contexts can't submit IO to driver any > more. I see what you mean. With the current code, we have the same issue already. We have 1 work_struct per connection/session and one connection/session per scsi_host. Basically, the iscsi protocol and socket layer only allow us to send the 1 command per connection at a time (you can't have 2 threads doing sendmsg/sendpage). It's why nvme/tcp is a lot better. It makes N tcp connections and each hwctx can use a different one.