On 7/20/22 10:44, Geert Uytterhoeven wrote:
On Wed, Jul 20, 2022 at 6:51 PM Bart Van Assche <bvanassche@xxxxxxx> wrote:
I'm not familiar with the SATA code but from a quick look it seems like
the above code is only triggered from inside the ATA error handler
(ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() ->
schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It
doesn't seem normal to me that the ATA error handler gets invoked during
a resume. How about testing the following two code changes?
Thanks for your suggestions!
* In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd,
sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below
that call add "flush_work(&sdkp->start_done_work)". This makes
sd_start_stop_device() again synchronous. This will learn us whether the
behavior change is caused by submitting the START command from another
context or by not waiting until the START command has finished.
Unfortunately this doesn't have any impact.
* Back out the above change, change "return sd_submit_start(sdkp, cmd,
sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and
below that statement add a call to
scsi_run_queue(sdkp->device->request_queue). If this change helps it
(that's the static scsi_run_queue() in drivers/scsi/scsi_lib.c?)
means that the scsi_run_queue() call is necessary to prevent reordering
of the START command with other SCSI commands.
Unfortunately this doesn't have any impact either.
That's surprising. Is there anything unusual about the test setup that I
should know, e.g. very small number of CPU cores or a very small queue
depth of the SATA device? How about adding pr_info() statements at the
start and end of the following functions and also before the return
statements in these functions to determine where execution of the START
command hangs?
* sd_start_done().
* sd_start_done_work().
Thanks,
Bart.