Re: [PATCH] blk-mq: don't fail driver tag allocation because of inactive hctx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 4, 2020 at 8:50 PM John Garry <john.garry@xxxxxxxxxx> wrote:
>
>
> >> That's your patch - ok, I can try.
> >>
>
> I still get timeouts and sometimes the same driver tag message occurs:
>
>   1014.232417] run queue from wrong CPU 0, hctx active
> [ 1014.237692] run queue from wrong CPU 0, hctx active
> [ 1014.243014] run queue from wrong CPU 0, hctx active
> [ 1014.248370] run queue from wrong CPU 0, hctx active
> [ 1014.253725] run queue from wrong CPU 0, hctx active
> [ 1014.259252] run queue from wrong CPU 0, hctx active
> [ 1014.264492] run queue from wrong CPU 0, hctx active
> [ 1014.269453] irq_shutdown irq146
> [ 1014.272752] CPU55: shutdown
> [ 1014.275552] psci: CPU55 killed (polled 0 ms)
> [ 1015.151530] CPU56: shutdownr=1621MiB/s,w=0KiB/s][r=415k,w=0 IOPS][eta
> 00m:00s]
> [ 1015.154322] psci: CPU56 killed (polled 0 ms)
> [ 1015.184345] CPU57: shutdown
> [ 1015.187143] psci: CPU57 killed (polled 0 ms)
> [ 1015.223388] CPU58: shutdown
> [ 1015.226174] psci: CPU58 killed (polled 0 ms)
> long sleep 8
> [ 1045.234781] scsi_times_out req=0xffff041fa13e6300[r=0,w=0 IOPS][eta
> 04m:30s]
>
> [...]
>
> >>
> >> I thought that if all the sched tags are put, then we should have no driver
> >> tag for that same hctx, right? That seems to coincide with the timeout (30
> >> seconds later)
> >
> > That is weird, if there is driver tag found, that means the request is
> > in-flight and can't be completed by HW.
>
> In blk_mq_hctx_has_requests(), we iterate the sched tags (when
> hctx->sched_tags is set). So can some requests not have a sched tag
> (even for scheduler set for the queue)?
>
>   I assume you have integrated
> > global host tags patch in your test,
>
> No, but the LLDD does not use request->tag - it generates its own.

Except for wrong queue mapping,  another reason is that the generated
tag may not
be unique. Either of two may cause such timeout issue when the managed
interrupt is
active.

Thanks,
Ming



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux