Hi Hans, On Sat, Aug 20, 2022 at 5:37 PM Hans de Goede <hdegoede@xxxxxxxxxx> wrote: > On 8/16/22 19:26, Bart Van Assche wrote: > > Although patch "Rework asynchronous resume support" eliminates the delay > > for some ATA disks after resume, it causes resume of ATA disks to fail > > on other setups. See also: > > * "Resume process hangs for 5-6 seconds starting sometime in 5.16" > > (https://bugzilla.kernel.org/show_bug.cgi?id=215880). > > * Geert's regression report > > (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@xxxxxxxxxxxxxx/). > > > > This is what I understand about this issue: > > * During resume, ata_port_pm_resume() starts the SCSI error handler. > > This changes the SCSI host state into SHOST_RECOVERY and causes > > scsi_queue_rq() to return BLK_STS_RESOURCE. > > * sd_resume() calls sd_start_stop_device() for ATA devices. That > > function in turn calls sd_submit_start() which tries to submit a START > > STOP UNIT command. That command can only be submitted after the SCSI > > error handler has changed the SCSI host state back to SHOST_RUNNING. > > * The SCSI error handler runs on its own thread and calls > > schedule_work(&(ap->scsi_rescan_task)). That causes > > ata_scsi_dev_rescan() to be called from the context of a kernel > > workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - > > maybe because all available tags have been allocated by > > sd_submit_start() calls (this is a guess). > > > > Cc: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> > > Cc: Hannes Reinecke <hare@xxxxxxx> > > Cc: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> > > Cc: gzhqyz@xxxxxxxxx > > Reported-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx> > > Reported-by: gzhqyz@xxxxxxxxx > > Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support"; v6.0-rc1~114^2~68) > > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > > As reported here I've been seeing tasks block/hang on IO > to a sata disk on a system with / on a NVME (which keeps > the system alive except for the SATA disk acccessing tasks): > > https://lore.kernel.org/regressions/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@xxxxxxxxxx/ > > I'm running 6.0-rc1 with this patch added now and so far > I've not seen the problem re-occur. > > I was also seeing 6.0 suspend/resume issues on 2 laptops with > sata disks (rather then NVME) which I did not yet get around > to collecting logs from / reporting. I'm happy to report that > those suspend/resume issues are also fixed by this: It looks like there is a (different) regression in v6.1-rc1 related to s2idle and s2ram, which is not fixed by this patch. In fact it also happens on boards where SATA is not used, it is just less likely to happen on the non-SATA boards. I still have to bisect it, which may take some time, as the issue is not 100% reproducible. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds