On 7/21/22 01:07, Geert Uytterhoeven wrote:
On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@xxxxxxx> wrote:
That's surprising. Is there anything unusual about the test setup that I
should know, e.g. very small number of CPU cores or a very small queue
depth of the SATA device? How about adding pr_info() statements at the
start and end of the following functions and also before the return
statements in these functions to determine where execution of the START
command hangs?
* sd_start_done().
* sd_start_done_work().
None of these functions seem to be called at all?
That's weird. This means that either sd_submit_start() hangs or that the
execution of the START command never finishes. The latter is unlikely
since the SCSI error handler is assumed to abort commands that hang. It
would also be weird if sd_submit_start() would hang before the START
command is submitted since the code flow for submitting the START
command is very similar to the code flow for submitting the START
command without patch "scsi: sd: Rework asynchronous resume support"
(calling scsi_execute()).
What is also weird is that there are at least two SATA setups on which
this code works fine, including my Qemu setup.
Although it is possible to enable tracing at boot time, adding the
following parameters to the kernel command line would generate too much
logging data:
tp_printk
trace_event=block_rq_complete,block_rq_error,block_rq_insert,block_rq_issue,block_rq_merge,block_rq_remap,block_rq_requeue,scsi_dispatch_cmd_done,scsi_dispatch_cmd_start,scsi_eh_wakeup,scsi_dispatch_cmd_error,scsi_dispatch_cmd_timeout
scsi_mod.scsi_logging_level=32256
I'm not sure what the best way is to proceed since I cannot reproduce
this issue.
Bart.