Re: [PATCH V2] blk-mq: Set request mapping to NULL in blk_mq_put_driver_tag

Jens Axboe <axboe@xxxxxxxxx> · Tue, 18 Dec 2018 11:11:49 -0700

On 12/18/18 11:08 AM, Kashyap Desai wrote:
>> On 12/18/18 10:48 AM, Kashyap Desai wrote:
>>>>
>>>> On 12/18/18 10:08 AM, Kashyap Desai wrote:
>>>>>>>
>>>>>>> Other block drivers (e.g. ib_srp, skd) do not need this to work
>>>>>>> reliably.
>>>>>>> It has been explained to you that the bug that you reported can be
>>>>>>> fixed by modifying the mpt3sas driver. So why to fix this by
>>>>>>> modifying the block layer? Additionally, what prevents that a race
>>>>>>> condition occurs between the block layer clearing
>>>>>>> hctx->tags->rqs[rq->tag] and
>>>>>>> scsi_host_find_tag() reading that same array element? I'm afraid
>>>>>>> that this is an attempt to paper over a real problem instead of
>>>>>>> fixing the root
>>>>>> cause.
>>>>>>
>>>>>> I have to agree with Bart here, I just don't see how the mpt3sas
>>>>>> use case is special. The change will paper over the issue in any
>>>>>> case.
>>>>>
>>>>> Hi Jens, Bart
>>>>>
>>>>> One of the key requirement for iterating whole tagset  using
>>>>> scsi_host_find_tag is to block scsi host. Once we are done that, we
>>>>> should be good. No race condition is possible if that part is taken
>>>>> care.
>>>>> Without this patch, if driver still receive scsi command from the
>>>>> hctx->tags->rqs which is really not outstanding.  I am finding this
>>>>> hctx->tags->is
>>>>> common issue for many scsi low level drivers.
>>>>>
>>>>> Just for example <fnic> - fnic_is_abts_pending() function has below
>>>>> code -
>>>>>
>>>>>         for (tag = 0; tag < fnic->fnic_max_tag_id; tag++) {
>>>>>                 sc = scsi_host_find_tag(fnic->lport->host, tag);
>>>>>                 /*
>>>>>                  * ignore this lun reset cmd or cmds that do not
>>>>> belong to
>>>>>                  * this lun
>>>>>                  */
>>>>>                 if (!sc || (lr_sc && (sc->device != lun_dev || sc ==
>>>>> lr_sc)))
>>>>>                         continue;
>>>>>
>>>>> Above code also has similar exposure of kernel panic like <mpt3sas>
>>>>> driver while accessing sc->device.
>>>>>
>>>>> Panic is more obvious if we have add/removal of scsi device before
>>>>> looping through scsi_host_find_tag().
>>>>>
>>>>> Avoiding block layer changes is also attempted in <mpt3sas> but our
>>>>> problem is to convert that code common for non-mq and mq.
>>>>> Temporary to unblock this issue, We have fixed <mpt3sas> using
>>>>> driver internals scsiio_tracker() instead of piggy back in
>>>>> scsi_command.
>>>>
>>>> For mq, the requests never go out of scope, they are always valid. So
>>>> the key question here is WHY they have been freed. If the queue gets
>>>> killed, then one potential solution would be to clear pointers in the
>>>> tag map belonging to that queue. That also takes it out of the hot
>>>> path.
>>>
>>> At driver load whenever driver does scsi_add_host_with_dma(), it
>>> follows below code path in block layer.
>>>
>>> scsi_mq_setup_tags
>>>   ->blk_mq_alloc_tag_set
>>>           -> blk_mq_alloc_rq_maps
>>>                      -> __blk_mq_alloc_rq_maps
>>>
>>> SML create two set of request pool. One is per HBA and other is per
>>> SDEV. I was confused why SML creates request pool per HBA.
>>>
>>> Example - IF HBA queue depth is 1K and there are 8 device behind that
>>> HBA, total request pool is created is 1K + 8 * scsi_device queue
>>> depth. 1K will be always static, but other request pool is managed
>>> whenever scsi device is added/removed.
>>>
>>> I never observe requests allocated per HBA is used in IO path. It is
>>> always request allocated per scsi device is what active.
>>> Also, what I observed is whenever scsi_device is deleted, associated
>>> request is also deleted. What is missing is - "Deleted request still
>>> available in
>>> hctx->tags->rqs[rq->tag]."
>>
>> So that sounds like the issue. If the device is deleted and its requests
>> go away,
>> those pointers should be cleared. That's what your patch should do, not do
>> it
>> for each IO.
> 
> At the time of device removal,  it requires reverse traversing.  Find out if
> each requests associated with sdev is part of hctx->tags->rqs() and clear
> that entry.
> Not sure about atomic traverse if more than one device removal is happening
> in parallel.  May be more error prone. ?
> 
> Just wondering both the way we will be removing invalid request from array.
> Are you suspecting any performance issue if we do it per IO ?

It's an extra store, and it's a store to an area that's then now shared
between issue and completion. Those are never a good idea. Besides, it's
the kind of issue you solve in the SLOW path, not in the fast path. Since
that's doable, it would be silly to do it for every IO.

This might not matter on mpt3sas, but on more efficient hw it definitely
will.

I'm still trying to convince myself that this issue even exists. I can see
having stale entries, but those should never be busy. Why are you finding
them with the tag iteration? It must be because the tag is reused, and
you are finding it before it's re-assigned?

-- 
Jens Axboe