Re: [PATCH 3/6] libata: resume in the background

Dan Williams <dan.j.williams@xxxxxxxxx> · Fri, 10 Jan 2014 19:11:20 -0800

On Fri, Jan 10, 2014 at 6:25 PM, Phillip Susi <psusi@xxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 01/10/2014 05:26 PM, Dan Williams wrote:
>> I was going to comment that this leaves us open to a race to
>> submit new i/o before recovery starts, but scsi_schedule_eh already
>> blocks new i/o, so I think we're good.  I think it deserves a
>> comment here about why it's safe to not care.
>
> I don't follow, could you explain?

The question I have when reading "__ata_port_resume_common(ap, mesg,
&dontcare);" is what makes it safe for the port to not actually be
resumed upon return.  The answer I believe is that the host is
guaranteed to be in the SHOST_RECOVERY state upon return and no new
i/o submissions will occur until the error handler has a chance to
run.

>>> static int ata_port_resume(struct device *dev) { int rc;
>>>
>>> +       if (pm_runtime_suspended(dev)) +               return 0;
>>
>> Why?  If we dpm_resume() a port that was rpm_suspend()ed the state
>> needs to be reset to active.  I think this check also forces the
>> port to resume twice, once for system resume and again for the
>> eventual runtime_resume.
>
> It stops the first resume by returning early, so only the second one
> happens.

Got it, but it looks odd.

>
>> ...so, the pm_runtime_suspended() check should go and something
>> like this folded in:
>>
>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>> index 92d7797223be..a6cc107c9434 100644 ---
>> a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -4143,5
>> +4143,11 @@ static void ata_eh_handle_port_resume(struct ata_port
>> *ap) ap->pm_result = NULL; } spin_unlock_irqrestore(ap->lock,
>> flags); + +       if (rc == 0) { +
>> pm_runtime_disable(dev); +
>> pm_runtime_set_active(dev); +
>> pm_runtime_enable(dev); +       } } #endif /* CONFIG_PM */
>
> Ahh, and that will stop the second resume, rather than the previous
> change to stop the first?
>
> Don't we want to stop the first rather than the second?  Otherwise if
> the port is runtime suspended at system suspend time ( maybe no drive
> connected to it? ) then there is no sense in resuming it at system
> resume time.

Hmm, if the drive disconnected during suspend I think we want to find
that out sooner rather than later.  The changelog should really read
"Don't block the resume path waiting for the disk re-establish its
link".

Given it's async do we gain that much by deferring a one time check of
the ports at resume? At the very least I think tying rpm_resume to
dpm_resume like that is an additional change to defer to a separate
patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html