Re: [PATCH V5] scsi: core: only re-run queue in scsi_end_request() if device queue is busy

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 7 Sep 2020 20:45:32 -0700

On 2020-09-07 18:47, Ming Lei wrote:
> On Mon, Sep 07, 2020 at 09:52:42AM -0700, Bart Van Assche wrote:
>> On 2020-09-07 00:10, Ming Lei wrote:
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 7affaaf8b98e..a05e431ee62a 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -551,8 +551,25 @@ static void scsi_run_queue_async(struct scsi_device *sdev)
>>>  	if (scsi_target(sdev)->single_lun ||
>>>  	    !list_empty(&sdev->host->starved_list))
>>>  		kblockd_schedule_work(&sdev->requeue_work);
>>> -	else
>>> -		blk_mq_run_hw_queues(sdev->request_queue, true);
>>> +	else {
>>
>> Please follow the Linux kernel coding style and balance braces.
> 
> Could you provide one document about such style? The patch does pass
> checkpatch, or I am happy to follow your suggestion if checkpatch is
> updated to this way.

Apparently the checkpatch script only warns about unbalanced braces with the
option --strict. From commit e4c5babd32f9 ("checkpatch: notice unbalanced
else braces in a patch") # v4.11:

    checkpatch: notice unbalanced else braces in a patch

    Patches that add or modify code like

            } else
                    <foo>
    or
            else {
                    <bar>

    where one branch appears to have a brace and the other branch does not
    have a brace should emit a --strict style message.

[ ... ]

+# check for single line unbalanced braces
+		if ($sline =~ /.\s*\}\s*else\s*$/ ||
+		    $sline =~ /.\s*else\s*\{\s*$/) {
+			CHK("BRACES", "Unbalanced braces around else statement\n" . $herecurr);
+		}
+

Anyway, I think the following output makes it clear that there are many more
balanced than non-balanced else statements:

$ git grep -c "} else {" | awk 'BEGIN {FS=":"} {total+=$2} END {print total}'
66944
$ git grep -Ec "$(printf "\t")else \{|\} else$" | awk 'BEGIN {FS=":"} {total+=$2} END {print total}'
12289

>>> +		/*
>>> +		 * smp_mb() implied in either rq->end_io or blk_mq_free_request
>>> +		 * is for ordering writing .device_busy in scsi_device_unbusy()
>>> +		 * and reading sdev->restarts.
>>> +		 */
>>> +		int old = atomic_read(&sdev->restarts);
>>
>> scsi_run_queue_async() has two callers: scsi_end_request() and scsi_queue_rq().
>> I don't see how ordering between scsi_device_unbusy() and the above atomic_read()
>> could be guaranteed if this function is called from scsi_queue_rq()?
>>
>> Regarding the I/O completion path, my understanding is that the I/O completion
>> path is as follows if rq->end_io == NULL:
>>
>> scsi_mq_done()
>>   blk_mq_complete_request()
>>     rq->q->mq_ops->complete(rq) = scsi_softirq_done
>>       scsi_finish_command()
>>         scsi_device_unbusy()
> 
> scsi_device_unbusy()
> 	atomic_dec(&sdev->device_busy);
> 
>>         scsi_cmd_to_driver(cmd)->done(cmd)
>>         scsi_io_completion()
>>           scsi_end_request()
>>             blk_update_request()
>>             scsi_mq_uninit_cmd()
>>             __blk_mq_end_request()
>>               blk_mq_free_request()
>>                 __blk_mq_free_request()
> 
> __blk_mq_free_request()
> 	blk_mq_put_tag
> 		smp_mb__after_atomic()
> 

Thanks for the clarification. How about changing the text "implied in either
rq->end_io or blk_mq_free_request" into "present in sbitmap_queue_clear()"
such that the person who reads the comment does not have to look up where
the barrier occurs?

>>
>>> +	/*
>>> +	 * Order writing .restarts and reading .device_busy. Its pair is
>>> +	 * implied by __blk_mq_end_request() in scsi_end_request() for
>>> +	 * ordering writing .device_busy in scsi_device_unbusy() and
>>> +	 * reading .restarts.
>>> +	 */
>>> +	smp_mb__after_atomic();
>>
>> What does "its pair is implied" mean? Please make the above comment
>> unambiguous.
> 
> See comment in scsi_run_queue_async().

How about making the above comment more by changing it into the following?
/*
 * Orders atomic_inc(&sdev->restarts) and atomic_read(&sdev->device_busy).
 * .restarts must be incremented before .device_busy is read because the code
 * in scsi_run_queue_async() depends on the order of these operations.
 */

>> Will that cause the queue to be run after a delay
>> although it should be run immediately?
> 
> Yeah, blk_mq_delay_run_hw_queues() will be called, however:
> 
> If scsi_run_queue_async() has scheduled run queue already, this code path
> won't queue a dwork successfully. On the other hand, if
> blk_mq_delay_run_hw_queues(SCSI_QUEUE_DELAY) has queued a dwork,
> scsi_run_queue_async() still can queue the dwork successfully, since the delay
> timer can be deactivated easily, see try_to_grab_pending(). In short, the case
> you described is an extremely unlikely event. Even though it happens,
> forward progress is still guaranteed.

I think I would sleep better if that race would be fixed. I'm concerned
that sooner or later someone will run a workload that triggers that scenario
systematically ...

Thanks,

Bart.