Re: [PATCH 0/5] blk-mq: allow to run queue if queue refcount is held

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 2, 2019 at 10:06 AM Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
>
>
>
> On 4/1/19 3:15 PM, Ming Lei wrote:
> > On Mon, Apr 1, 2019 at 1:27 PM Dongli Zhang <dongli.zhang@xxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 4/1/19 1:16 PM, Ming Lei wrote:
> >>> Hi Dongli,
> >>>
> >>> On Mon, Apr 01, 2019 at 01:05:46PM +0800, Dongli Zhang wrote:
> >>>>
> >>>>
> >>>> On 4/1/19 10:52 AM, Ming Lei wrote:
> >>>>> On Sun, Mar 31, 2019 at 07:39:17PM -0700, Bart Van Assche wrote:
> >>>>>> On 3/31/19 7:00 PM, Ming Lei wrote:
> >>>>>>> On Sun, Mar 31, 2019 at 08:27:35AM -0700, Bart Van Assche wrote:
> >>>>>>>> I'm not sure the approach of this patch series is really the direction we
> >>>>>>>> should pursue. There are many block driver that free resources immediately
> >>>>>>>
> >>>>>>> Please see scsi_run_queue(), and the queue refcount is always held
> >>>>>>> before run queue.
> >>>>>>
> >>>>>> That's not correct. There is no guarantee that q->q_usage_counter > 0 when
> >>>>>> scsi_run_queue() is called from inside scsi_requeue_run_queue().
> >>>>>
> >>>>> We don't need the guarantee of 'q->q_usage_counter > 0', I mean the
> >>>>> queue's kobj reference counter.
> >>>>>
> >>>>> What we need is to allow run queue to work correctly after queue is frozen
> >>>>> or cleaned up.
> >>>>>
> >>>>>>
> >>>>>>>> I'd like to avoid having to modify all block drivers that free resources
> >>>>>>>> immediately after blk_cleanup_queue() has returned. Have you considered to
> >>>>>>>> modify blk_mq_run_hw_queues() such that it becomes safe to call that
> >>>>>>>> function while blk_cleanup_queue() is in progress, e.g. by inserting a
> >>>>>>>> percpu_ref_tryget_live(&q->q_usage_counter) /
> >>>>>>>> percpu_ref_put(&q->q_usage_counter) pair?
> >>>>>>>
> >>>>>>> It can't work because blk_mq_run_hw_queues may happen after
> >>>>>>> percpu_ref_exit() is done.
> >>>>>>>
> >>>>>>> However, if we move percpu_ref_exit() into queue's release handler, we
> >>>>>>> don't need to grab q->q_usage_counter any more in blk_mq_run_hw_queues(),
> >>>>>>> and we still have to free hw queue resources in queue's release handler,
> >>>>>>> that is exactly what this patchset is doing.
> >>>>>>>
> >>>>>>> In short, getting q->q_usage_counter doesn't make a difference on this
> >>>>>>> issue.
> >>>>>>
> >>>>>> percpu_ref_tryget_live() fails if a per-cpu counter is in the "dead" state.
> >>>>>> percpu_ref_kill() changes the state of a per-cpu counter to the "dead"
> >>>>>> state. blk_freeze_queue_start() calls percpu_ref_kill(). blk_cleanup_queue()
> >>>>>> already calls blk_set_queue_dying() and that last function calls
> >>>>>> blk_freeze_queue_start(). So I think that what you wrote is not correct and
> >>>>>> that inserting a percpu_ref_tryget_live()/percpu_ref_put() pair in
> >>>>>> blk_mq_run_hw_queues() or blk_mq_run_hw_queue() would make a difference and
> >>>>>> also that moving the percpu_ref_exit() call into blk_release_queue() makes
> >>>>>> sense.
> >>>>>
> >>>>> If percpu_ref_exit() is moved to blk_release_queue(), we still need to
> >>>>> move freeing of hw queue's resource into blk_release_queue() like what
> >>>>> the patchset is doing.
> >>>>
> >>>> Hi Ming,
> >>>>
> >>>> Would you mind help explain why we still need to move freeing of hw queue's
> >>>> resource into blk_release_queue() like what the patchset is doing?
> >>>>
> >>>> Let's assume there is no deadlock when percpu_ref_tryget_live() is used,
> >>>
> >>> Could you explain why the assumption is true?
> >>>
> >>> We have to run queue after starting to freeze queue for draining
> >>> allocated requests and making forward progress. Inside blk_freeze_queue_start(),
> >>> percpu_ref_kill() marks this ref as DEAD, then percpu_ref_tryget_live() returns
> >>> false, then queue won't be run.
> >>
> >> Hi Ming,
> >>
> >> I understand the assumption is invalid and there is issue when using
> >> percpu_ref_tryget_live. And I also understand we have to run queue after
> >> starting to freeze queue for draining allocated requests and making forward
> >> progress.
> >
> > OK.
> >
> >>
> >>
> >> I am just wondering specifically on why "If percpu_ref_exit() is moved to
> >> blk_release_queue(), we still need to move freeing of hw queue's resource into
> >> blk_release_queue() like what the patchset is doing." based on below Bart's
> >> statement:
> >>
> >> "percpu_ref_tryget_live() fails if a per-cpu counter is in the "dead" state.
> >> percpu_ref_kill() changes the state of a per-cpu counter to the "dead" state.
> >> blk_freeze_queue_start() calls percpu_ref_kill(). blk_cleanup_queue() already
> >> calls blk_set_queue_dying() and that last function call
> >> blk_freeze_queue_start(). So I think that what you wrote is not correct and that
> >> inserting a percpu_ref_tryget_live()/percpu_ref_put() pair in
> >> blk_mq_run_hw_queues() or blk_mq_run_hw_queue() would make a difference and also
> >> that moving the percpu_ref_exit() call into blk_release_queue() makes sense."
> >
> > As you mentioned, percpu_ref_tryget_live() can't be used to avoid run queue
> > during cleanup, then run queue can come when queue is cleaned up.
> >
> >>
> >> That's is, what is penalty if we do not  move freeing of hw queue's resource
> >> into blk_release_queue() like what the patchset is doing in above situation?
> >
> > kernel oops as reported by James, because some fields of hctx will be used
> > by run queue, and they can be freed by blk_mq_free_queue() in
> > blk_cleanup_queue().
> >
> >>
> >> I ask this question just because I would like to better understand the source
> >> code. Does "hw queue's resource" indicate the below?
> >>
> >> +        if (hctx->flags & BLK_MQ_F_BLOCKING)
> >> +                cleanup_srcu_struct(hctx->srcu);
> >> +        blk_free_flush_queue(hctx->fq);
> >> +        sbitmap_free(&hctx->ctx_map);
> >
> > Right.
>
> Hi Ming,
>
> Thank you very much for the detailed explanation.
>
> I think maybe I misunderstood your message in the email.
>
> In another direction posted by Bart as below, regardless about which direction
> is better, that implementation does not move freeing of hw queue's resource into
> blk_release_queue(), although percpu_ref_exit() is moved to blk_release_queue().
> That's why I would like to confirm.
>
> https://lore.kernel.org/linux-block/20190401212014.192753-1-bvanassche@xxxxxxx/
>
> In that direction, the more friendly percpu_ref_tryget(), which is suggested by
> Jianchao, is used. I would like just to confirm that there is no need to move
> freeing of hw queue's resource into blk_release_queue() when the get/put method
> is friendly and fair enough.

I don't think it is friendly to add such unnecessary stuff in the very
fast path.



Thanks,
Ming Lei



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux