Re: PM runtime_error handling missing in many drivers?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.02.25 23:15, Brian Norris wrote:
On Wed, Feb 12, 2025 at 08:29:34PM +0100, Rafael J. Wysocki wrote:
The reason why runtime_error is there is to prevent runtime PM
callbacks from being run until something is done about the error,
under the assumption that running them in that case may make the
problem worse.

What makes you think it will make the problem worse? That seems like a
rather large assumption to me. What kind of things do you think go
wrong, that it requires the framework to stop any future attempts? Just
spam (e.g., logging noise, if -EIO is persistent)? Or something worse?e

suspend() is three operations, potentially

a) record device state
b) arm remote wakeup
c) transition to a lower power state

I wouldn't trust a device to perform the first two steps
without error handling either. It is an unnecessary risk.

And OTOH, there are clearly cases where retrying would be not only
acceptable, but expected -- so giving special case to -EAGAIN and
-EBUSY, per another branch of this thread, seems wise.

Yes


I'd also note that AFAICT, there is no similar feature in system PM. If
suspend() fails, we unwind and report the error ... but still allow
future system suspend requests. resume() is even "worse" -- errors are
essentially logged and ignored.

Suspend requests from runtime PM are different. They happen spontaneously.
Secondly, failures to suspend in runtime PM are far cheaper.

I'm not sure if I see a substantial difference between suspend and
resume in that respect: If any of them fails, the state of the device
is kind of unstable.  In particular, if resume fails and the device
doesn't actually resume, something needs to be done about it or it
just becomes unusable.

Again, if you look at it in an abstract manner, this is a mess. Resume()
is actually two functions

a) transition to a power state that allows an operation
b) restore device settings

It is possible for the second step to fail after the first has worked.

To me, it's about the state of the device. If suspend failed, the device
may still be active and functional -- but not power-efficient. If resume
failed, the device may be suspended and non-functional.

But anyway, I don't think I require asymmetry; I'm just more interested
in unnecessary non-functionality. (Power inefficiency is less important,
as in the worst case, we can at least save our data, reboot, and try
again.)

You are calling for asymmetry ;-)

If you fail to resume, you will need to return an error. The functions
are just not equal in terms of consequences. We don't resume for fun.
We do, however, suspend just because a timer fires.

	Regards
		Oliver





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Input]     [Linux Kernel]     [Linux SCSI]     [X.org]

  Powered by Linux