On 4/1/19 1:16 PM, Ming Lei wrote: > Hi Dongli, > > On Mon, Apr 01, 2019 at 01:05:46PM +0800, Dongli Zhang wrote: >> >> >> On 4/1/19 10:52 AM, Ming Lei wrote: >>> On Sun, Mar 31, 2019 at 07:39:17PM -0700, Bart Van Assche wrote: >>>> On 3/31/19 7:00 PM, Ming Lei wrote: >>>>> On Sun, Mar 31, 2019 at 08:27:35AM -0700, Bart Van Assche wrote: >>>>>> I'm not sure the approach of this patch series is really the direction we >>>>>> should pursue. There are many block driver that free resources immediately >>>>> >>>>> Please see scsi_run_queue(), and the queue refcount is always held >>>>> before run queue. >>>> >>>> That's not correct. There is no guarantee that q->q_usage_counter > 0 when >>>> scsi_run_queue() is called from inside scsi_requeue_run_queue(). >>> >>> We don't need the guarantee of 'q->q_usage_counter > 0', I mean the >>> queue's kobj reference counter. >>> >>> What we need is to allow run queue to work correctly after queue is frozen >>> or cleaned up. >>> >>>> >>>>>> I'd like to avoid having to modify all block drivers that free resources >>>>>> immediately after blk_cleanup_queue() has returned. Have you considered to >>>>>> modify blk_mq_run_hw_queues() such that it becomes safe to call that >>>>>> function while blk_cleanup_queue() is in progress, e.g. by inserting a >>>>>> percpu_ref_tryget_live(&q->q_usage_counter) / >>>>>> percpu_ref_put(&q->q_usage_counter) pair? >>>>> >>>>> It can't work because blk_mq_run_hw_queues may happen after >>>>> percpu_ref_exit() is done. >>>>> >>>>> However, if we move percpu_ref_exit() into queue's release handler, we >>>>> don't need to grab q->q_usage_counter any more in blk_mq_run_hw_queues(), >>>>> and we still have to free hw queue resources in queue's release handler, >>>>> that is exactly what this patchset is doing. >>>>> >>>>> In short, getting q->q_usage_counter doesn't make a difference on this >>>>> issue. >>>> >>>> percpu_ref_tryget_live() fails if a per-cpu counter is in the "dead" state. >>>> percpu_ref_kill() changes the state of a per-cpu counter to the "dead" >>>> state. blk_freeze_queue_start() calls percpu_ref_kill(). blk_cleanup_queue() >>>> already calls blk_set_queue_dying() and that last function calls >>>> blk_freeze_queue_start(). So I think that what you wrote is not correct and >>>> that inserting a percpu_ref_tryget_live()/percpu_ref_put() pair in >>>> blk_mq_run_hw_queues() or blk_mq_run_hw_queue() would make a difference and >>>> also that moving the percpu_ref_exit() call into blk_release_queue() makes >>>> sense. >>> >>> If percpu_ref_exit() is moved to blk_release_queue(), we still need to >>> move freeing of hw queue's resource into blk_release_queue() like what >>> the patchset is doing. >> >> Hi Ming, >> >> Would you mind help explain why we still need to move freeing of hw queue's >> resource into blk_release_queue() like what the patchset is doing? >> >> Let's assume there is no deadlock when percpu_ref_tryget_live() is used, > > Could you explain why the assumption is true? > > We have to run queue after starting to freeze queue for draining > allocated requests and making forward progress. Inside blk_freeze_queue_start(), > percpu_ref_kill() marks this ref as DEAD, then percpu_ref_tryget_live() returns > false, then queue won't be run. Hi Ming, I understand the assumption is invalid and there is issue when using percpu_ref_tryget_live. And I also understand we have to run queue after starting to freeze queue for draining allocated requests and making forward progress. I am just wondering specifically on why "If percpu_ref_exit() is moved to blk_release_queue(), we still need to move freeing of hw queue's resource into blk_release_queue() like what the patchset is doing." based on below Bart's statement: "percpu_ref_tryget_live() fails if a per-cpu counter is in the "dead" state. percpu_ref_kill() changes the state of a per-cpu counter to the "dead" state. blk_freeze_queue_start() calls percpu_ref_kill(). blk_cleanup_queue() already calls blk_set_queue_dying() and that last function call blk_freeze_queue_start(). So I think that what you wrote is not correct and that inserting a percpu_ref_tryget_live()/percpu_ref_put() pair in blk_mq_run_hw_queues() or blk_mq_run_hw_queue() would make a difference and also that moving the percpu_ref_exit() call into blk_release_queue() makes sense." That's is, what is penalty if we do not move freeing of hw queue's resource into blk_release_queue() like what the patchset is doing in above situation? I ask this question just because I would like to better understand the source code. Does "hw queue's resource" indicate the below? + if (hctx->flags & BLK_MQ_F_BLOCKING) + cleanup_srcu_struct(hctx->srcu); + blk_free_flush_queue(hctx->fq); + sbitmap_free(&hctx->ctx_map); Thank you very much! Dongli Zhang