On Tue, Jul 26, 2022 at 11:05 AM Oliver Neukum <oneukum@xxxxxxxx> wrote: > > > > On 08.07.22 22:10, Rafael J. Wysocki wrote: > > On 7/8/2022 1:03 PM, Vincent Whitchurch wrote: > > >> Perhaps Rafael can shed some light on this. > > > > The driver always knows more than the framework about the device's > > actual state. The framework only knows that something failed, but it > > doesn't know what it was and what way it failed. > > Hi, > > thinking long and deeply about this I do not think that this seemingly > obvious assertion is actually correct. I guess that depends on what is regarded as "the framework". I mean the PM-runtime code, excluding the bus type or equivalent. > > The idea was that drivers would clear these errors. > > I am afraid that is a deeply hidden layering violation. Yes, a driver's > resume() method may have failed. In that case, if that is the same > driver, it will obviously already know about the failure. So presumably it will do something to recover and avoid returning the error in the first place. >From the PM-runtime core code perspective, if an error is returned by a suspend callback and it is not -EBUSY or -EAGAIN, the subsequent suspend is also likely to fail. If a resume callback returns an error, any subsequent suspend or resume operations are likely to fail. Storing the error effectively prevents subsequent operations from being carried out in both cases and that's why it is done. > PM operations, however, are operating on a tree. A driver requesting > a resume may get an error code. But it has no idea where this error > comes from. The generic code knows at least that. Well, what do you mean by "the generic code"? > Let's look at at a USB storage device. The request to resume comes > from sd.c. sd.c is certainly not equipped to handle a PCI error > condition that has prevented a USB host controller from resuming. Sure, but this doesn't mean that suspending or resuming the device is a good idea until the error condition gets resolved. > I am afraid this part of the API has issues. And they keep growing > the more we divorce the device driver from the bus driver, which > actually does the PM operation. Well, in general suspending or resuming a device is a collaborative effort and if one of the pieces falls over, making it work again involves fixing up the failing piece and notifying the others that it is ready again. However, that part isn't covered and I'm not sure if it can be covered in a sufficiently generic way.