On Fri, Jan 10, 2014 at 6:25 PM, Phillip Susi <psusi@xxxxxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 01/10/2014 05:26 PM, Dan Williams wrote: >> I was going to comment that this leaves us open to a race to >> submit new i/o before recovery starts, but scsi_schedule_eh already >> blocks new i/o, so I think we're good. I think it deserves a >> comment here about why it's safe to not care. > > I don't follow, could you explain? The question I have when reading "__ata_port_resume_common(ap, mesg, &dontcare);" is what makes it safe for the port to not actually be resumed upon return. The answer I believe is that the host is guaranteed to be in the SHOST_RECOVERY state upon return and no new i/o submissions will occur until the error handler has a chance to run. >>> static int ata_port_resume(struct device *dev) { int rc; >>> >>> + if (pm_runtime_suspended(dev)) + return 0; >> >> Why? If we dpm_resume() a port that was rpm_suspend()ed the state >> needs to be reset to active. I think this check also forces the >> port to resume twice, once for system resume and again for the >> eventual runtime_resume. > > It stops the first resume by returning early, so only the second one > happens. Got it, but it looks odd. > >> ...so, the pm_runtime_suspended() check should go and something >> like this folded in: >> >> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c >> index 92d7797223be..a6cc107c9434 100644 --- >> a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -4143,5 >> +4143,11 @@ static void ata_eh_handle_port_resume(struct ata_port >> *ap) ap->pm_result = NULL; } spin_unlock_irqrestore(ap->lock, >> flags); + + if (rc == 0) { + >> pm_runtime_disable(dev); + >> pm_runtime_set_active(dev); + >> pm_runtime_enable(dev); + } } #endif /* CONFIG_PM */ > > Ahh, and that will stop the second resume, rather than the previous > change to stop the first? > > Don't we want to stop the first rather than the second? Otherwise if > the port is runtime suspended at system suspend time ( maybe no drive > connected to it? ) then there is no sense in resuming it at system > resume time. Hmm, if the drive disconnected during suspend I think we want to find that out sooner rather than later. The changelog should really read "Don't block the resume path waiting for the disk re-establish its link". Given it's async do we gain that much by deferring a one time check of the ports at resume? At the very least I think tying rpm_resume to dpm_resume like that is an additional change to defer to a separate patch. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html