On Fri, Jan 7, 2022 at 4:27 PM Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > Quoting Rob Clark (2022-01-06 10:14:46) > > From: Rob Clark <robdclark@xxxxxxxxxxxx> > > > > System suspend uses pm_runtime_force_suspend(), which cheekily bypasses > > the runpm reference counts. This doesn't actually work so well when the > > GPU is active. So add a reasonable delay waiting for the GPU to become > > idle. > > Maybe also say: > > Failure to wait during system wide suspend leads to GPU hangs seen on > resume. The fallout can actually be a lot more than just GPU hangs.. that is just the case that is easy (for us) to observe because the crash logging captures them. But sync/async external aborts are also possible.. and I think even just undefined behavior (ie. I think if the timing works out right, it can survive but just "lose" rendering that hadn't completed yet) > > > > Alternatively we could just return -EBUSY in this case, but that has the > > disadvantage of causing system suspend to fail. > > > > Signed-off-by: Rob Clark <robdclark@xxxxxxxxxxxx> > > --- > > drivers/gpu/drm/msm/adreno/adreno_device.c | 9 +++++++++ > > drivers/gpu/drm/msm/msm_gpu.c | 3 +++ > > drivers/gpu/drm/msm/msm_gpu.h | 3 +++ > > 3 files changed, 15 insertions(+) > > > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c > > index 93005839b5da..b677ca3fd75e 100644 > > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c > > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c > > @@ -611,6 +611,15 @@ static int adreno_resume(struct device *dev) > > static int adreno_suspend(struct device *dev) > > { > > struct msm_gpu *gpu = dev_to_gpu(dev); > > + int ret = 0; > > Please don't assign and then immediately overwrite. > > > + > > + ret = wait_event_timeout(gpu->retire_event, > > + !msm_gpu_active(gpu), > > + msecs_to_jiffies(1000)); > > + if (ret == 0) { > > The usual pattern is > > long timeleft; > > timeleft = wait_event_timeout(...) > if (!timeleft) { > /* no time left; timed out */ > > Can it be the same pattern here? It helps because people sometimes > forget that wait_event_timeout() returns the time that is left and not > an error code when it times out. ok, I'll update in v2.. BR, -R > > + dev_err(dev, "Timeout waiting for GPU to suspend\n"); > > + return -EBUSY; > > + } > > > > return gpu->funcs->pm_suspend(gpu); > > }