Hoi Bart, On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > On 7/20/22 10:44, Geert Uytterhoeven wrote: > > On Wed, Jul 20, 2022 at 6:51 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > >> I'm not familiar with the SATA code but from a quick look it seems like > >> the above code is only triggered from inside the ATA error handler > >> (ata_do_eh() -> ata_eh_recover() -> ata_eh_revalidate_and_attach() -> > >> schedule_work(&(ap->scsi_rescan_task) -> ata_scsi_dev_rescan()). It > >> doesn't seem normal to me that the ATA error handler gets invoked during > >> a resume. How about testing the following two code changes? > > > > Thanks for your suggestions! > > > >> * In sd_start_stop_device(), change "return sd_submit_start(sdkp, cmd, > >> sizeof(cmd))" into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and below > >> that call add "flush_work(&sdkp->start_done_work)". This makes > >> sd_start_stop_device() again synchronous. This will learn us whether the > >> behavior change is caused by submitting the START command from another > >> context or by not waiting until the START command has finished. > > > > Unfortunately this doesn't have any impact. > > > >> * Back out the above change, change "return sd_submit_start(sdkp, cmd, > >> sizeof(cmd))" again into "sd_submit_start(sdkp, cmd, sizeof(cmd))" and > >> below that statement add a call to > >> scsi_run_queue(sdkp->device->request_queue). If this change helps it > > > > (that's the static scsi_run_queue() in drivers/scsi/scsi_lib.c?) > > > >> means that the scsi_run_queue() call is necessary to prevent reordering > >> of the START command with other SCSI commands. > > > > Unfortunately this doesn't have any impact either. > > That's surprising. Is there anything unusual about the test setup that I > should know, e.g. very small number of CPU cores or a very small queue > depth of the SATA device? How about adding pr_info() statements at the > start and end of the following functions and also before the return > statements in these functions to determine where execution of the START > command hangs? > * sd_start_done(). > * sd_start_done_work(). None of these functions seem to be called at all? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds