On Tue, Sep 18, 2018 at 04:25:33PM +0800, jianchao.wang wrote: > > > On 09/18/2018 03:55 PM, Ming Lei wrote: > > On Tue, Sep 18, 2018 at 03:51:10PM +0800, jianchao.wang wrote: > >> > >> > >> On 09/18/2018 03:39 PM, Ming Lei wrote: > >>> On Tue, Sep 18, 2018 at 09:22:32AM +0800, jianchao.wang wrote: > >>>> Hi Ming > >>>> > >>>> On 09/17/2018 07:35 PM, Ming Lei wrote: > >>>>> On Mon, Sep 17, 2018 at 10:46:34AM +0800, jianchao.wang wrote: > >>>>>> Hi Ming > >>>>>> > >>>>>> On 09/14/2018 07:33 PM, Ming Lei wrote: > >>>>>>> On Fri, Sep 14, 2018 at 11:40 AM jianchao.wang > >>>>>>> <jianchao.w.wang@xxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> Hi Ming > >>>>>>>> > >>>>>>>> On 09/13/2018 08:15 PM, Ming Lei wrote: > >>>>>>>>> EXPORT_SYMBOL(__scsi_execute); > >>>>>>>>> @@ -3246,6 +3251,7 @@ static int scsi_internal_device_block(struct scsi_device *sdev) > >>>>>>>>> else > >>>>>>>>> scsi_wait_for_queuecommand(sdev); > >>>>>>>>> } > >>>>>>>>> + wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending)); > >>>>>>>>> mutex_unlock(&sdev->state_mutex); > >>>>>>>>> > >>>>>>>>> return err; > >>>>>>>> ... > >>>>>>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > >>>>>>>>> index 3aee9464a7bf..8bcb7ecc0c06 100644 > >>>>>>>>> --- a/drivers/scsi/scsi_sysfs.c > >>>>>>>>> +++ b/drivers/scsi/scsi_sysfs.c > >>>>>>>>> @@ -1393,6 +1393,7 @@ void __scsi_remove_device(struct scsi_device *sdev) > >>>>>>>>> > >>>>>>>>> blk_cleanup_queue(sdev->request_queue); > >>>>>>>>> cancel_work_sync(&sdev->requeue_work); > >>>>>>>>> + wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending)) > >>>>>>>> > >>>>>>>> This nr_admin_pending could drain the ongoing scsi_request_fn/scsi_queue_rq, > >>>>>>>> but I'm afraid it cannot stop new ones coming in, such as the ones that have passed > >>>>>>>> the sdev state checking and have not crossed the atomic_inc(&sdev->nr_admin_pending). > >>>>>>>> > >>>>>>> > >>>>>>> The counter of .nr_admin_pending is introduced for draining queued > >>>>>>> admin requests to this scsi device. > >>>>>>> > >>>>>>> Actually new requests have been prevented from entering scsi_queue_rq(), > >>>>>>> please see the two callers of wait_event(sdev->admin_wq, > >>>>>>> !atomic_read(&sdev->nr_admin_pending)). > >>>>>>> > >>>>>> For example > >>>>>> > >>>>>> _scsi_execute > >>>>>> ... > >>>>>> scsi_internal_device_block > >>>>>> scsi_internal_device_block_nowait > >>>>>> blk_mq_quiesce_queue > >>>>>> wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending)) > >>>>>> &sdev->nr_admin_pending; > >>>>>> > >>>>>> blk_execute_rq(...) > >>>>>> > >>>>>> atomic_dec(&sdev->nr_admin_pending); > >>>>>> wake_up_all(&sdev->admin_wq); > >>>>>> > >>>>>> Or do you mean the scsi_queue_rq -> scsi_prep_state_check could gate out of ? > >>>>> > >>>>> I got it, then this issue can be fixed simply by moving atomic_inc/dec(&sdev->nr_admin_pending) > >>>>> and related wake_up_all(&sdev->admin_wq) into scsi_admin_queue_rq(). > >>>>> > >>>> > >>>> I don't think so. It is a similar scenario. > >>>> > >>>> I guess a state checking is needed after atomic_inc(&sdev->nr_admin_pending), like: > >>>> > >>>> _scsi_execute > >>>> ... > >>>> scsi_internal_device_block > >>>> scsi_internal_device_block_nowait > >>>> blk_mq_quiesce_queue > >>>> wait_event(sdev->admin_wq, !atomic_read(&sdev->nr_admin_pending)) > >>>> atomic_inc(&sdev->nr_admin_pending); > >>>> if state checking fails > >>>> goto done > >>> > >>> The check will be done in scsi_admin_queue_rq(). > >>> > >>>> > >>>> blk_execute_rq(...) > >>>> > >>>> atomic_dec(&sdev->nr_admin_pending); > >>>> wake_up_all(&sdev->admin_wq); > >>> > >>> I guess you may misunderstand the purpose of .nr_admin_pending, which is > >>> for draining requests to .queue_rq(). So it is enough to just move the > >>> inc/dec of .nr_admin_pending into scsi_admin_queue_rq(), right? > >> Yes. > > > > Thanks for your confirmation. > > > >> > >> But I just think of how to assign with queue quiesce. > >> The existence of per-host adminq seems to break it. > > > > The per-host adminq won't be frozen or quiesced at all, > > This is the place which makes me confused. > > Before this patchset, when scsi_internal_device_block quiesces the request_queue, > Both normal IO or admin requests cannot be processed any more. > > Given the per-adminq, we could say every scsi device has two request_queues, or > there could be two request_queues sending requests to the a scsi_device: > - normal IO request_queue > - shared per-host adminq > > We could quiesce the normal IO request_queue, but what's about the per-host adminq ? The adminq request can be quisced in the scsi_device level, just as normal IO request, because the scsi_device instance is same with normal IO. But in blk-mq level, this admin queue won't be quiesced or frozen at all. > > > could you explain > > your concern in a bit detail about 'assign with queue quiesce'? > > > > The scsi_queue_rq->scsi_prep_state_check could stop requests entering further, but > the scsi_queue_rq is still invoked. As I mentioned, there isn't any change in scsi level about handling the admin request because the scsi_device is shared, and scsi_host too. Thanks, Ming