Re: [PATCH 1/1] blk-mq: fix blk_mq_hw_ctx active request accounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jens,

On Sat, May 13, 2023 at 07:52:37PM -0600, Jens Axboe wrote:
> On 5/13/23 4:12 PM, Tian Lan wrote:
> > From: Tian Lan <tian.lan@xxxxxxxxxxxx>
> > 
> > The nr_active counter continues to increase over time which causes the
> > blk_mq_get_tag to hang until the thread is rescheduled to a different
> > core despite there are still tags available.
> > 
> > kernel-stack
> > 
> >   INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds
> >   Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11
> >   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >   task:inboundIOReacto state:D stack:0  pid:3014879 ppid:4557 flags:0x00000000
> >     Call Trace:
> >     <TASK>
> >     __schedule+0x351/0xa20
> >     scheduler+0x5d/0xe0
> >     io_schedule+0x42/0x70
> >     blk_mq_get_tag+0x11a/0x2a0
> >     ? dequeue_task_stop+0x70/0x70
> >     __blk_mq_alloc_requests+0x191/0x2e0
> > 
> > kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before
> > __blk_mq_free_request being called.
> > 
> >   320    320  kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1
> >          b'__blk_mq_free_request+0x1 [kernel]'
> >          b'bt_iter+0x50 [kernel]'
> >          b'blk_mq_queue_tag_busy_iter+0x318 [kernel]'
> >          b'blk_mq_timeout_work+0x7c [kernel]'
> >          b'process_one_work+0x1c4 [kernel]'
> >          b'worker_thread+0x4d [kernel]'
> >          b'kthread+0xe6 [kernel]'
> >          b'ret_from_fork+0x1f [kernel]'
> > 
> > Signed-off-by: Tian Lan <tian.lan@xxxxxxxxxxxx>
> 
> I think this needs:
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")

I am still not sure what is wrong with above commit, and the real cause
of Tian's issue & this fix.

> 
> tags, but I'm also now confused as to whether the flush handling part
> of that patch. Ming, what am I missing in terms of not honoring the
> flush ref on put?

>From Tian's log, the issue didn't happen on flush request.

> What happens if two iterators both grab the
> flush at the same time, and then subsequently put them?

Both two code paths try to acquire the request refcount, and put it
if the reference is grabbed, and reference release is just done
in flush_end_io().


Thanks, 
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux