Re: [PATCH] blk-mq: don't fail driver tag allocation because of inactive hctx

Ming Lei <tom.leiming@xxxxxxxxx> · Thu, 4 Jun 2020 21:28:50 +0800

On Thu, Jun 4, 2020 at 8:50 PM John Garry <john.garry@xxxxxxxxxx> wrote:
>
>
> >> That's your patch - ok, I can try.
> >>
>
> I still get timeouts and sometimes the same driver tag message occurs:
>
>   1014.232417] run queue from wrong CPU 0, hctx active
> [ 1014.237692] run queue from wrong CPU 0, hctx active
> [ 1014.243014] run queue from wrong CPU 0, hctx active
> [ 1014.248370] run queue from wrong CPU 0, hctx active
> [ 1014.253725] run queue from wrong CPU 0, hctx active
> [ 1014.259252] run queue from wrong CPU 0, hctx active
> [ 1014.264492] run queue from wrong CPU 0, hctx active
> [ 1014.269453] irq_shutdown irq146
> [ 1014.272752] CPU55: shutdown
> [ 1014.275552] psci: CPU55 killed (polled 0 ms)
> [ 1015.151530] CPU56: shutdownr=1621MiB/s,w=0KiB/s][r=415k,w=0 IOPS][eta
> 00m:00s]
> [ 1015.154322] psci: CPU56 killed (polled 0 ms)
> [ 1015.184345] CPU57: shutdown
> [ 1015.187143] psci: CPU57 killed (polled 0 ms)
> [ 1015.223388] CPU58: shutdown
> [ 1015.226174] psci: CPU58 killed (polled 0 ms)
> long sleep 8
> [ 1045.234781] scsi_times_out req=0xffff041fa13e6300[r=0,w=0 IOPS][eta
> 04m:30s]
>
> [...]
>
> >>
> >> I thought that if all the sched tags are put, then we should have no driver
> >> tag for that same hctx, right? That seems to coincide with the timeout (30
> >> seconds later)
> >
> > That is weird, if there is driver tag found, that means the request is
> > in-flight and can't be completed by HW.
>
> In blk_mq_hctx_has_requests(), we iterate the sched tags (when
> hctx->sched_tags is set). So can some requests not have a sched tag
> (even for scheduler set for the queue)?
>
>   I assume you have integrated
> > global host tags patch in your test,
>
> No, but the LLDD does not use request->tag - it generates its own.

Except for wrong queue mapping,  another reason is that the generated
tag may not
be unique. Either of two may cause such timeout issue when the managed
interrupt is
active.

Thanks,
Ming