On Tue, Jun 21, 2022 at 11:38:33AM +0200, Oliver Neukum wrote: > On 20.06.22 16:42, Vincent Whitchurch wrote: > > [110778.050000][ T27] rpm_resume: 0-0009 flags-4 cnt-1 dep-0 auto-1 p-0 irq-0 child-0 > > [110778.050000][ T27] rpm_return_int: rpm_resume+0x24d/0x11d0:0-0009 ret=-22 > > > > The following patch fixes the issue on vcnl4000, but is this the right in the > > fix? And, unless I'm missing something, there are dozens of drivers > > with the same problem. > > Yes. The point of pm_runtime_resume_and_get() is to remove the need > for handling errors when the resume fails. So I fail to see why a > permanent record of a failure makes sense for this API. I don't understand it either. > > diff --git a/drivers/iio/light/vcnl4000.c b/drivers/iio/light/vcnl4000.c > > index e02e92bc2928..082b8969fe2f 100644 > > --- a/drivers/iio/light/vcnl4000.c > > +++ b/drivers/iio/light/vcnl4000.c > > @@ -414,6 +414,8 @@ static int vcnl4000_set_pm_runtime_state(struct vcnl4000_data *data, bool on) > > > > if (on) { > > ret = pm_runtime_resume_and_get(dev); > > + if (ret) > > + pm_runtime_set_suspended(dev); > > } else { > > pm_runtime_mark_last_busy(dev); > > ret = pm_runtime_put_autosuspend(dev); > > If you need to add this to every driver, you can just as well add it to > pm_runtime_resume_and_get() to avoid the duplication. Yes, the documentation says that the error should be cleared, but it's unclear why the driver is expected to do it. From the documentation it looks the driver is supposed to choose between pm_runtime_set_active() and pm_runtime_set_suspended() to clear the error, but how/why is this choice supposed to be made in the driver when the driver doesn't know more than the framework about the status of the device? Perhaps Rafael can shed some light on this. > But I am afraid we need to ask a deeper question. Is there a point > in recording failures to resume? The error code is reported back. > If a driver wishes to act upon it, it can. The core really only > uses the result to block new PM operations. > But nobody requests a resume unless it is necessary. Thus I fail > to see the point of checking this flag in resume as opposed to > suspend. If we fail, we fail, why not retry? It seems to me that the > record should be used only during runtime suspend. I guess this is also a question for Rafael. Even if the error recording is removed from runtime_resume and only done on suspend failures, all these drivers still have the problem of not clearing the error, since the next resume will fail if that is not done. > And as an immediate band aid, some errors like ENOMEM should > never be recorded.