Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING

Mike Christie <michael.christie@xxxxxxxxxx> · Wed, 9 Mar 2022 13:38:50 -0600

On 3/8/22 7:43 PM, Ming Lei wrote:
> On Tue, Mar 08, 2022 at 07:17:13PM -0600, Mike Christie wrote:
>> On 3/8/22 6:53 PM, Ming Lei wrote:
>>> On Mon, Mar 07, 2022 at 06:39:54PM -0600, Mike Christie wrote:
>>>> The software iscsi driver's queuecommand can block and taking the extra
>>>> hop from kblockd to its workqueue results in a performance hit. Allowing
>>>> it to set BLK_MQ_F_BLOCKING and transmit from that context directly
>>>> results in a 20-30% improvement in IOPs for workloads like:
>>>>
>>>> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio
>>>> --iodepth=128  --numjobs=1
>>>>
>>>> and for all write workloads.
>>>
>>> This single patch shouldn't make any difference for iscsi, so please
>>> make it as last one if performance improvement data is provided
>>> in commit log.
>>
>> Ok.
>>
>>>
>>> Also is there performance effect for other worloads? such as multiple
>>> jobs? iscsi is SQ hardware, so if driver is blocked in ->queuecommand()
>>> via BLK_MQ_F_BLOCKING, other contexts can't submit IO to scsi ML any more.
>>
>> If you mean multiple jobs running on the same connection/session then
>> they are all serialized now. A connection can only do 1 cmd at a time.
>> There's a big mutex around it in the network layer, so multiple jobs
>> just suck no matter what.
> 
> I guess one block device can only bind to one isci connection, given the
> 1 cmd per connection limit, so looks multiple jobs is fine.
> 
>>
>> If you mean multiple jobs from different connection/sessions, then the
>> iscsi code with this patchset blocks only because the network layer
>> takes a mutex for a short time. We configure it to not block for things
>> like socket space, memory allocations, we do zero copy IO normally, etc
>> so it's quick.
>>
>> We also can do up to workqueues max_active limit worth of calls so
>> other things can normally send IO. We haven't found a need to increase
>> it yet.
>  
> I meant that hctx->run_work is required for blk-mq to dispatch IO, iscsi is
> SQ HBA, so there is only single work_struct. If one context is blocked in
> ->queue_rq or ->queuecommand, other contexts can't submit IO to driver any
> more.

I see what you mean. With the current code, we have the same issue already.
We have 1 work_struct per connection/session and one connection/session
per scsi_host.

Basically, the iscsi protocol and socket layer only allow us to send the 1
command per connection at a time (you can't have 2 threads doing
sendmsg/sendpage). It's why nvme/tcp is a lot better. It makes N tcp
connections and each hwctx can use a different one.