RE: [bug report] Hang on sync after dd

Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> · Tue, 1 Dec 2020 15:56:50 +0530

> @Kashyap, have you guys tested megaraid sas much for this?

John - I tested V4 version "scsi: core: Only re-run queue in
scsi_end_request() if device queue is busy" on MR controller.
I used different reduced device queue depth (1 to 16). I can try the exact
same test case with MR controller.

>
> Thanks,
> John
>
>
> Block debugfs info is as follows:
>
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat active cpu101/ cpu96/
> cpu99/ dispatch_busy io_poll sched_tags tags busy cpu102/ cpu97/ ctx_map
> dispatched queued sched_tags_bitmap tags_bitmap cpu100/ cpu103/ cpu98/
> dispatch flags run state type estuary:/sys/kernel/debug/block/sda/hctx8$
> cat
> cpu cpu100/ cpu101/ cpu102/ cpu103/ cpu96/ cpu97/ cpu98/ cpu99/
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu cpu100/ cpu101/
> cpu102/ cpu103/ cpu96/ cpu97/ cpu98/ cpu99/
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu96/ completed
> default_rq_list dispatched merged poll_rq_list read_rq_list
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu96/dispatched
> 0 0
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu97/dispatched
> 0 0
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu98/dispatched
> 0 0
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu99/dispatched
> 0 0
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu100/dispatched
> 3 0
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat cpu100/completed
> 2 0
> estuary:/sys/kernel/debug/block/sda/hctx8$
> estuary:/sys/kernel/debug/block/sda/hctx8$
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat state SCHED_RESTART

When I tested V3 "scsi: core: Only re-run queue in scsi_end_request() if
device queue  is busy". I noticed the similar hang and that was fixed in V4
(final patch).
Let me try on MR controller one more time. Hctx state SCHED_RESTART
indicates that someone should kicked-off h/w queue but it was missed. It may
be possible that
When you revert " scsi: core: Only re-run queue in scsi_end_request() if
device queue  is busy", actual race condition windows narrows and it may be
actually existing hidden issue.

> estuary:/sys/kernel/debug/block/sda/hctx8$ ls active cpu101 cpu96 cpu99
> dispatch_busy io_poll sched_tags tags busy cpu102 cpu97 ctx_map
> dispatched queued sched_tags_bitmap tags_bitmap
> cpu100 cpu103 cpu98 dispatch flags run state type
> estuary:/sys/kernel/debug/block/sda/hctx8$ cat dispatch 000000007abb596e
> {.op=FLUSH, .cmd_flags=PREFLUSH,
> .rq_flags=FLUSH_SEQ|MQ_INFLIGHT|DONTPREP, .state=idle, .tag=21,
> .internal_tag=-1, .cmd=opcode=0x35 35 00 00 00 00 00 00 00 00 00,
> .retries=0, .result = 0x0, .flags=TAGGED|INITIALIZED|3, .timeout=60.000,

If this issue is reproducible, can you check pending commands. Is there any
pattern in pending command ?

> allocated 2208.876 s ago} estuary:/sys/kernel/debug/block/sda/hctx8$
>
>
> On cpu100, it seems completed is less than number dispatched.
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature