RE: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization

"Brandt, Todd E" <todd.e.brandt@xxxxxxxxx> · Tue, 10 Sep 2013 20:23:35 +0000

Good suggestion on the reuse of the existing ata_port_request_pm function. I have this habit of introducing new changes in such a way that they can be easily turned on and off for testing, but that's not appropriate here. I'll redo this patch with the functionality weaved into all the existing functions instead of creating new ones. Thanks for the feedback!

Todd Brandt
Linux Kernel Developer OTC, Hillsboro OR
https://opensource.intel.com/linux-wiki/ToddBrandt

________________________________________
From: linux-scsi-owner@xxxxxxxxxxxxxxx [linux-scsi-owner@xxxxxxxxxxxxxxx] on behalf of Bartlomiej Zolnierkiewicz [b.zolnierkie@xxxxxxxxxxx]
Sent: Tuesday, September 10, 2013 5:59 AM
To: todd.e.brandt@xxxxxxxxxxxxxxx
Cc: linux-scsi@xxxxxxxxxxxxxxx
Subject: Re: [PATCH/RESEND 1/2] Hard disk S3 resume time optimization

On Friday, September 06, 2013 03:45:35 PM Todd E Brandt wrote:
> On Fri, Sep 06, 2013 at 06:54:48PM +0200, Bartlomiej Zolnierkiewicz wrote:
> >
> > Hi,
> >
> > On Thursday, September 05, 2013 05:38:53 PM Todd E Brandt wrote:
> > > This is the final draft of the non-blocking hard disk resume patch. I've
> > > included some performance results to demonstrate the real benefits
> > > of this patch. Please note that this patch provides a MASSIVE performance
> > > improvement in hard disk resume. It's too valuable to ignore, so I really
> > > need the help of the maintainers to get this implemented. Even if this
> > > patch is deemed the wrong approach I hope you won't abandon the idea
> > > altogether. There is so much potential in this kind of optimization and
> > > I'm highly motivated to make this work.
> > >
> > > To demonstrate the substantial performance improvement I've run the
> > > AnalyzeSuspend tool on three different platforms patched with the new
> > > behavior. Each is running Ubuntu Raring with a kernel built from the
> > > upstream kernel source.
> > >
> > > The complete analysis and graphical outputs of the tool are available
> > > online at 01.org:
> > > https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
> >
> > Please include changes descriptions from the link above in the actual patch
> > descriptions.
> >
> > "The essential issue behind hard disks' lengthy resume time is the ata
> > port driver blocking until the ATA port hardware is finished coming online.
> > So the kernel isn't really doing anything during all those seconds that
> > the disks are resuming, it's just blocking until the hardware says it's
> > ready to accept commands. This patch changes the ATA port driver to issue
> > the wakeup command and then return immediately. Any commands issued to
> > the hardware will be queued up and will be executed once the port is
> > physically online. Thus no information is lost, and although the wait
> > time itself isn't removed, it doesn't hold up the rest of the system
> > which can function on what's left in RAM and cache."
> >
>
>  What happens when somebody requests suspend while ATA port resume is still
>  running?
>
> The suspend request is queued until the resume has completed. I've tested
> that case very heavily. Basically if you do two suspends in a row (within
> seconds of each other) you lose any performance benefit, but that's a use
> case that should happen only very rarerly (and wouldn't be expected to
> be of any power benefit since too many sequential suspends actually
> takes more power than just letting the machine stay in run mode).

OK.

> >
> > > Here's a synopsis of the results.
> > >
> > > -------------------------------------------------------
> > > [Computer One]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz
> > > SATA: Intel C600/X79 series chipset 6-Port SATA AHCI (r5)
> > > DISK CONFIG:
> > >     ATA1: 240 GB SSD
> > >     ATA2: 3 TB Hard Disk
> > >     ATA3: 500 GB Hard Disk
> > >     ATA4: DVD-ROM (with cd inserted)
> > >     ATA5: 2 TB Hard Disk
> > >     ATA6: 1 TB Hard Disk
> > > RESUME TIME WITHOUT PATCH: 11656 ms
> > > RESUME TIME WITH PATCH: 1110 ms
>
>  These results are with both patches applied, could you tell what is
>  the improvement from the patch #1 alone?
>
> With either patch alone there is no benefit. I've tested with each
> patch applied independently to verify that they function, but you
> can't get the benefit without both.
>
>  Where is the 11s delay coming from? Are we doing the resume for all
>  ports sequentially instead of in parallel? In such case you should be
>  fixing the power management layer instead.
>
> The ATA ports are all resumed in parallel. The long resume time
> comes from extremely large drives. The ATA port wakeup delay is
> the hardware running an internal subroutine to reinitialize the
> connection, which I guess takes longer for bigger disks. I'm not
> completely sure what it's doing, I just know that there's nothing
> software can do to speed it up.

OK.

> > > IMPROVEMENT: 10.5X speedup
> > >
> > > -------------------------------------------------------
> > > [Computer Two]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> > > SATA: Intel 7 Series/C210 Series Chipset Family 6-port SATA [AHCI mode] (r4)
> > > DISK CONFIG:
> > >     ATA1: 320 GB Hard Disk
> > >     ATA2 - ATA6: Empty slots
> > > RESUME TIME WITHOUT PATCH: 5416 ms
> > > RESUME TIME WITH PATCH: 448 ms
> > >
> > > IMPROVEMENT: 12X speedup
> > >
> > > -------------------------------------------------------
> > > [Computer Three]
> > > PLATFORM: Ubuntu Raring Ringtail (13.04)
> > > KERNEL: 3.11.0-rc7
> > > CPU: Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
> > > SATA: Intel Lynx Point 6-port SATA Controller 1 [AHCI mode] (r2)
> > > DISK CONFIG:
> > >     ATA1,3,4,6: Empty Slots
> > >     ATA2: DVD-ROM (empty)
> > >     ATA5: 500 GB Hard Disk
> > > RESUME TIME WITHOUT PATCH: 5385 ms
> > > RESUME TIME WITH PATCH: 688 ms
> > >
> > > IMPROVEMENT: 7.8X speedup
> > >
> > > -------------------------------------------------------
> > >
> > > Signed-off-by: Todd Brandt <todd.e.brandt@xxxxxxxxx>
> > > Signed-off-by: Arjan van de Ven <arjan@xxxxxxxxxxxxxxx>
> > >
> > >  drivers/ata/libata-core.c | 36 +++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 35 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> > > index c24354d..6cf0c15 100644
> > > --- a/drivers/ata/libata-core.c
> > > +++ b/drivers/ata/libata-core.c
> > > @@ -5415,6 +5415,40 @@ static int ata_port_resume(struct device *dev)
> > >   return rc;
> > >  }
> > >
> > > +static int ata_port_resume_async(struct device *dev)
> > > +{
> > > + struct ata_port *ap = to_ata_port(dev);
> > > + struct ata_link *link;
> > > + unsigned long flags;
> > > + int ret = 0;
> > > +
> > > + if (ap->pflags & ATA_PFLAG_PM_PENDING) {
> > > +         WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
> > > +         ret = -EAGAIN;
> > > +         goto out;
> > > + }
> > > +
> > > + spin_lock_irqsave(ap->lock, flags);
> > > +
> > > + ap->pm_mesg = PMSG_RESUME;
> > > + ap->pm_result = NULL;
> > > + ap->pflags |= ATA_PFLAG_PM_PENDING;
> > > + ata_for_each_link(link, ap, HOST_FIRST) {
> > > +         link->eh_info.action |= ATA_EH_RESET;
> > > +         link->eh_info.flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
> > > + }
> > > +
> > > + ata_port_schedule_eh(ap);
> > > +
> > > + spin_unlock_irqrestore(ap->lock, flags);
> >
>
>  Why have you open-coded ata_port_request_pm() instead of just re-using it
>  but with "async" parameter set?
>
> Because the async mode call requires a place to capture the value, and in
> this case I want to ignore it. The PM subsystem doesn't supply any method
> of checking on the status of a resume callback that's running asynchronously
> because it has no way of knowing. So without altering the pm code there's
> no point in collecting it.

Then how's about changing ata_port_request_pm() to have async bool flag and
another argument (i.e. async_rc) which will be used to store the return value?

Something like:

static int ata_port_request_pm(struct ata_port *ap, pm_message_t mesg,
                               unsigned int action, unsigned int ehi_flags,
                               bool async, int *async_rc)
{
        struct ata_link *link;
        unsigned long flags;
        int rc = 0;

        /* Previous resume operation might still be in
         * progress.  Wait for PM_PENDING to clear.
         */
        if (ap->pflags & ATA_PFLAG_PM_PENDING) {
                if (async) {
                        if (async_rc) {
                                *async_rc = -EAGAIN;
                                return 0;
                        }
                        return -EAGAIN;
                }
                ata_port_wait_eh(ap);
                WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
        }

        /* request PM ops to EH */
        spin_lock_irqsave(ap->lock, flags);

        ap->pm_mesg = mesg;
        if (async)
                ap->pm_result = async_rc ? async_rc : NULL;
        else
                ap->pm_result = &rc;

        ap->pflags |= ATA_PFLAG_PM_PENDING;
        ata_for_each_link(link, ap, HOST_FIRST) {
                link->eh_info.action |= action;
                link->eh_info.flags |= ehi_flags;
        }

        ata_port_schedule_eh(ap);

        spin_unlock_irqrestore(ap->lock, flags);

        /* wait and check result */
        if (!async) {
                ata_port_wait_eh(ap);
                WARN_ON(ap->pflags & ATA_PFLAG_PM_PENDING);
        }

        return rc;
}

> >
> > > + out:
> > > + pm_runtime_disable(dev);
> > > + pm_runtime_set_active(dev);
> > > + pm_runtime_enable(dev);
> > > + return ret;
> > >
> > > +}
> > >
> > >  /*
> > >   * For ODDs, the upper layer will poll for media change every few seconds,
> > >   * which will make it enter and leave suspend state every few seconds. And
> > > @@ -5451,7 +5485,7 @@ static int ata_port_runtime_resume(struct device *dev)
> > >
> > >  static const struct dev_pm_ops ata_port_pm_ops = {
> > >   .suspend = ata_port_suspend,
> > > - .resume = ata_port_resume,
> > > + .resume = ata_port_resume_async,
> >
>
>  ->resume will now return success even though it can later fail during
>  the async operation (error value will be lost), this is no-go, sorry.
>
> Yes, that's the nature of the asynchronous paradigm. The return value in this
> case is the status of whether the resume was initiated, rather than if it was
> completed. I'm letting the scsi and ata drivers assume control over their
> own error reporting in this case (which they already do). If you look at
> the ata_port resume code you'll see that the ata_port_resume callback returns
> the status of the ahci_port_resume callback, which is always 0. So for SATA
> disks the returned value for the resume callback is always 0. Error
> handling and reporting is taken care of in the scsi error handler thread,
> which runs separately from the pm core. I agree that in general it's a
> "no-go" to ignore a return value, but in this specific case for this
> specific driver, the return value is erronuous anyway, so there's no
> harm in ignoring it.

Makes sense, thanks for explaining this.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html