[Bug 215880] Resume process hangs for 5-6 seconds starting sometime in 5.16

bugzilla-daemon@xxxxxxxxxx · Mon, 10 Jul 2023 00:47:13 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=215880

--- Comment #48 from Damien Le Moal (damien.lemoal@xxxxxxx) ---
(In reply to Paul Ausbeck from comment #47)
> The PCI/GPU messages are only connected to the ATA messages in the sense
> that the PCI/GPU activity is unexpectedly deferred until after ATA init is
> complete. Since resuming of devices is supposedly asynchronous, one would
> have thought that PCI/GPU init activity would be completed long before hdd
> spin up is complete.

Maybe try enabling PM debug messages ? That could tell us what is going on.

> I realize that deferring drive spin up until needed would not be easy,
> that's why I called such an idea a tour de force. In the past it may have
> been common for an ata device to not resume reliably, but today even the 10
> year old ata devices on my Ivy Bridge machine resume just as reliably as
> they normally operate, which is quite reliably. If it weren't for power
> outages, I'd have years of continuous uptime. It's just a thought, but it
> may be time to revisit how disks, especially spinning disks, are resumed. It
> seems to me that the chance of a hdd failure during resume is not any
> greater than at any other time. It should be theoretically possible to queue
> up resume commands and execute them only when needed to service actual
> demand. The latency would have to eventually be absorbed, but if the api's
> are designed and implemented properly that would just happen when needed.

Spinning up the disk on resume is needed as many drives require the disk to be
spun up to reply to commands, and that even applies to commands that do not
access the media (e.g. IDENTIFY, READ LOG, etc). The reason is that many drives
save their meta-data on a reserved area of the media (the spinning disk).

So even if the kernel were to explicitly defer spinning up the disk, the disk
itself would in many instances automatically spinup, and that would manifest
itself with a long delay to the reply to the initial IDENTIFY command issued
during the rescan initiated by resume. And delaying that rescan is also not
desired as that would force the PM code to report "OK, this device is resumed
and ready" while it is in fact not ready at all to receive commands. So the
"tour de force" is really not desired at all in practice.

You likely can avoid the hdd spinup on resume by soft-removing the device
before suspend. Then we could add a libata option to ignore some ports on
resume so that they are not probed. But then, reusing the drive after resume
will require a manual rescan for the disk to show up again. And all of this
also assumes that there is no file system mounted on the disk, of course.

Given that most laptop these days do not use HDDs anymore, I do not think all
of this is worth the effort.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.