Hoi Bart, On Thu, Jul 21, 2022 at 8:15 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > On 7/21/22 01:07, Geert Uytterhoeven wrote: > > On Wed, Jul 20, 2022 at 8:04 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > >> That's surprising. Is there anything unusual about the test setup that I > >> should know, e.g. very small number of CPU cores or a very small queue > >> depth of the SATA device? How about adding pr_info() statements at the > >> start and end of the following functions and also before the return > >> statements in these functions to determine where execution of the START > >> command hangs? > >> * sd_start_done(). > >> * sd_start_done_work(). > > > > None of these functions seem to be called at all? > > That's weird. This means that either sd_submit_start() hangs or that the > execution of the START command never finishes. The latter is unlikely > since the SCSI error handler is assumed to abort commands that hang. It > would also be weird if sd_submit_start() would hang before the START > command is submitted since the code flow for submitting the START > command is very similar to the code flow for submitting the START > command without patch "scsi: sd: Rework asynchronous resume support" > (calling scsi_execute()). I think you misunderstood: none of these functions seem to be called, even when reading from hard drive works fine. > What is also weird is that there are at least two SATA setups on which > this code works fine, including my Qemu setup. > > Although it is possible to enable tracing at boot time, adding the > following parameters to the kernel command line would generate too much > logging data: > > tp_printk > trace_event=block_rq_complete,block_rq_error,block_rq_insert,block_rq_issue,block_rq_merge,block_rq_remap,block_rq_requeue,scsi_dispatch_cmd_done,scsi_dispatch_cmd_start,scsi_eh_wakeup,scsi_dispatch_cmd_error,scsi_dispatch_cmd_timeout > scsi_mod.scsi_logging_level=32256 > > I'm not sure what the best way is to proceed since I cannot reproduce > this issue. During s2idle, the following trace data is generated: kworker/u16:9-325 [000] ...2. 230.478731: block_rq_issue: 8,0 N 0 () 0 + 0 [kworker/u16:9] kworker/u16:9-325 [000] ...2. 230.478745: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0 cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00) <idle>-0 [007] d.h3. 230.478832: scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=0 cmnd=(SYNCHRONIZE_CACHE - raw=35 00 00 00 00 00 00 00 00 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD) <idle>-0 [000] ..s2. 230.478851: block_rq_complete: 8,0 N () 18446744073709551615 + 0 [0] kworker/u16:9-325 [000] ...2. 230.483134: block_rq_issue: 8,0 N 0 () 0 + 0 [kworker/u16:9] kworker/u16:9-325 [000] ...2. 230.483136: scsi_dispatch_cmd_start: host_no=0 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1 cmnd=(START_STOP - raw=1b 00 00 00 00 00) <idle>-0 [007] d.h3. 230.624530: scsi_dispatch_cmd_done: host_no=0 channel=0 id=0 lun=0 data_sgl=0 prot_sgl=0 prot_op=SCSI_PROT_NORMAL driver_tag=0 scheduler_tag=1 cmnd=(START_STOP - raw=1b 00 00 00 00 00) result=(driver=DRIVER_OK host=DID_OK message=COMMAND_COMPLETE status=SAM_STAT_GOOD) <idle>-0 [000] d.s4. 230.624634: scsi_eh_wakeup: host_no=0 <idle>-0 [000] ..s2. 230.624642: block_rq_complete: 8,0 N () 18446744073709551615 + 0 [0] kworker/u16:14-1027 [007] d..3. 231.393642: scsi_eh_wakeup: host_no=0 When reading from hard drive after s2idle, no more trace data is generated. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds