On Mon, Sep 07, 2020 at 09:52:42AM -0700, Bart Van Assche wrote: > On 2020-09-07 00:10, Ming Lei wrote: > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 7affaaf8b98e..a05e431ee62a 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -551,8 +551,25 @@ static void scsi_run_queue_async(struct scsi_device *sdev) > > if (scsi_target(sdev)->single_lun || > > !list_empty(&sdev->host->starved_list)) > > kblockd_schedule_work(&sdev->requeue_work); > > - else > > - blk_mq_run_hw_queues(sdev->request_queue, true); > > + else { > > Please follow the Linux kernel coding style and balance braces. Could you provide one document about such style? The patch does pass checkpatch, or I am happy to follow your suggestion if checkpatch is updated to this way. > > > + /* > > + * smp_mb() implied in either rq->end_io or blk_mq_free_request > > + * is for ordering writing .device_busy in scsi_device_unbusy() > > + * and reading sdev->restarts. > > + */ > > + int old = atomic_read(&sdev->restarts); > > scsi_run_queue_async() has two callers: scsi_end_request() and scsi_queue_rq(). > I don't see how ordering between scsi_device_unbusy() and the above atomic_read() > could be guaranteed if this function is called from scsi_queue_rq()? > > Regarding the I/O completion path, my understanding is that the I/O completion > path is as follows if rq->end_io == NULL: > > scsi_mq_done() > blk_mq_complete_request() > rq->q->mq_ops->complete(rq) = scsi_softirq_done > scsi_finish_command() > scsi_device_unbusy() scsi_device_unbusy() atomic_dec(&sdev->device_busy); > scsi_cmd_to_driver(cmd)->done(cmd) > scsi_io_completion() > scsi_end_request() > blk_update_request() > scsi_mq_uninit_cmd() > __blk_mq_end_request() > blk_mq_free_request() > __blk_mq_free_request() __blk_mq_free_request() blk_mq_put_tag smp_mb__after_atomic() > blk_queue_exit() > scsi_run_queue_async() > > I haven't found any store memory barrier between the .device_busy change in > scsi_device_unbusy() and the scsi_run_queue_async() call? Did I perhaps overlook > something? > > > + /* > > + * ->restarts has to be kept as non-zero if there new budget > > + * contention comes. > > Please fix the grammar in the above sentence. OK. > > > + /* > > + * Order writing .restarts and reading .device_busy. Its pair is > > + * implied by __blk_mq_end_request() in scsi_end_request() for > > + * ordering writing .device_busy in scsi_device_unbusy() and > > + * reading .restarts. > > + */ > > + smp_mb__after_atomic(); > > What does "its pair is implied" mean? Please make the above comment > unambiguous. See comment in scsi_run_queue_async(). > > > + /* > > + * If all in-flight requests originated from this LUN are completed > > + * before setting .restarts, sdev->device_busy will be observed as > > + * zero, then blk_mq_delay_run_hw_queues() will dispatch this request > > + * soon. Otherwise, completion of one of these request will observe > > + * the .restarts flag, and the request queue will be run for handling > > + * this request, see scsi_end_request(). > > + */ > > + if (unlikely(atomic_read(&sdev->device_busy) == 0 && > > + !scsi_device_blocked(sdev))) > > + blk_mq_delay_run_hw_queues(sdev->request_queue, SCSI_QUEUE_DELAY); > > + return false; > > What will happen if all in-flight requests complete after > scsi_run_queue_async() has read .restarts and before it executes > atomic_cmpxchg()? One of these completions will run atomic_cmpxchg() successfully, and the queue is re-run immediately from scsi_run_queue_async(). > Will that cause the queue to be run after a delay > although it should be run immediately? Yeah, blk_mq_delay_run_hw_queues() will be called, however: If scsi_run_queue_async() has scheduled run queue already, this code path won't queue a dwork successfully. On the other hand, if blk_mq_delay_run_hw_queues(SCSI_QUEUE_DELAY) has queued a dwork, scsi_run_queue_async() still can queue the dwork successfully, since the delay timer can be deactivated easily, see try_to_grab_pending(). In short, the case you described is an extremely unlikely event. Even though it happens, forward progress is still guaranteed. Thanks, Ming