On 26.07.22 17:41, Rafael J. Wysocki wrote: > On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@xxxxxxxx> wrote: > I guess that depends on what is regarded as "the framework". I mean > the PM-runtime code, excluding the bus type or equivalent. Yes, we have multiple candidates in the generic case. Easy to overengineer. >>> The idea was that drivers would clear these errors. >> >> I am afraid that is a deeply hidden layering violation. Yes, a driver's >> resume() method may have failed. In that case, if that is the same >> driver, it will obviously already know about the failure. > > So presumably it will do something to recover and avoid returning the > error in the first place. Yes, but that does not help us if they do return an error. > From the PM-runtime core code perspective, if an error is returned by > a suspend callback and it is not -EBUSY or -EAGAIN, the subsequent > suspend is also likely to fail. True. > If a resume callback returns an error, any subsequent suspend or > resume operations are likely to fail. Also true, but the consequences are different. > Storing the error effectively prevents subsequent operations from > being carried out in both cases and that's why it is done. I am afraid seeing these two operations as equivalent for this purpose is a problem for two reasons: 1. suspend can be initiated by the generic framework 2. a failure to suspend leads to worse power consumption, while a failure to resume is -EIO, at best >> PM operations, however, are operating on a tree. A driver requesting >> a resume may get an error code. But it has no idea where this error >> comes from. The generic code knows at least that. > > Well, what do you mean by "the generic code"? In this case the device model, which has the tree and all dependencies. Error handling here is potentially very complicated because 1. a driver can experience an error from a node higher in the tree 2. a driver can trigger a failure in a sibling 3. a driver for a node can be less specific than the drivers higher up Reducing this to a single error condition is difficult. Suppose you have a USB device with two interfaces. The driver for A initiates a resume. Interface A is resumed; B reports an error. Should this block further attempts to suspend the whole device? >> Let's look at at a USB storage device. The request to resume comes >> from sd.c. sd.c is certainly not equipped to handle a PCI error >> condition that has prevented a USB host controller from resuming. > > Sure, but this doesn't mean that suspending or resuming the device is > a good idea until the error condition gets resolved. Suspending clearly yes. Resuming is another matter. It has to work if you want to operate without errors. >> I am afraid this part of the API has issues. And they keep growing >> the more we divorce the device driver from the bus driver, which >> actually does the PM operation. > > Well, in general suspending or resuming a device is a collaborative > effort and if one of the pieces falls over, making it work again > involves fixing up the failing piece and notifying the others that it > is ready again. However, that part isn't covered and I'm not sure if > it can be covered in a sufficiently generic way. True. But that still cannot solve the question what is to be done if error handling fails. Hence my proposal: - record all failures - heed the record only when suspending Regards Oliver