Hi, On Fri, May 15, 2020 at 9:37 AM Doug Anderson <dianders@xxxxxxxxxxxx> wrote: > > Hi, > > On Fri, May 15, 2020 at 5:06 AM <kalyan_t@xxxxxxxxxxxxxx> wrote: > > > > On 2020-05-14 21:47, Doug Anderson wrote: > > > Hi, > > > > > > On Fri, May 1, 2020 at 6:31 AM Kalyan Thota <kalyan_t@xxxxxxxxxxxxxx> > > > wrote: > > >> > > >> "The PM core always increments the runtime usage counter > > >> before calling the ->suspend() callback and decrements it > > >> after calling the ->resume() callback" > > >> > > >> DPU and DSI are managed as runtime devices. When > > >> suspend is triggered, PM core adds a refcount on all the > > >> devices and calls device suspend, since usage count is > > >> already incremented, runtime suspend was not getting called > > >> and it kept the clocks on which resulted in target not > > >> entering into XO shutdown. > > >> > > >> Add changes to force suspend on runtime devices during pm sleep. > > >> > > >> Changes in v1: > > >> - Remove unnecessary checks in the function > > >> _dpu_kms_disable_dpu (Rob Clark). > > >> > > >> Changes in v2: > > >> - Avoid using suspend_late to reset the usagecount > > >> as suspend_late might not be called during suspend > > >> call failures (Doug). > > >> > > >> Changes in v3: > > >> - Use force suspend instead of managing device usage_count > > >> via runtime put and get API's to trigger callbacks (Doug). > > >> > > >> Changes in v4: > > >> - Check the return values of pm_runtime_force_suspend and > > >> pm_runtime_force_resume API's and pass appropriately (Doug). > > >> > > >> Changes in v5: > > > > > > Can you please put the version number properly in your subject? It's > > > really hard to tell one version of your patch from another. > > > > > > > > >> - With v4 patch, test cycle has uncovered issues in device resume. > > >> > > >> On bubs: cmd tx failures were seen as SW is sending panel off > > >> commands when the dsi resources are turned off. > > >> > > >> Upon suspend, DRM driver will issue a NULL composition to the > > >> dpu, followed by turning off all the HW blocks. > > >> > > >> v5 changes will serialize the NULL commit and resource unwinding > > >> by handling them under PM prepare and PM complete phases there by > > >> ensuring that clks are on when panel off commands are being > > >> processed. > > > > > > I'm still most definitely not an expert in how all the DRM pieces all > > > hook up together, but the solution you have in this patch seems wrong > > > to me. As far as I can tell the "prepare" state isn't supposed to be > > > actually doing the suspend work and here that's exactly what you're > > > doing. I think you should find a different solution to ensure > > > ordering is correct. > > > > > > -Doug > > > > > > > Hi, > > Quite honestly I'm probably not the right person to be reviewing this > code. I mostly just noticed one of your early patches and it looked > strange to me. Hopefully someone with actual experience in how all > the DRM components work together can actually review and see if this > makes sense. Maybe Sean would know better? > > That being said, let me at least look at what you're saying... > > > > Prepare and Complete are callbacks defined as part of Sleep and Resume > > sequence > > > > Entering PM SUSPEND the phases are : prepare --> suspend --> > > suspend_late --> suspend_noirq. > > While leaving PM SUSPEND the phases are: resume_noirq --> resume_early > > --> resume --> complete. > > Sure, it's part of the sequence. It's also documented in pm.h as: > > * The principal role of this callback is to prevent new children of > * the device from being registered after it has returned (the driver's > * subsystem and generally the rest of the kernel is supposed to prevent > * new calls to the probe method from being made too once @prepare() has > * succeeded). > > It does not feel like that matches your usage of this call. > > > > The reason to push drm suspend handling to PM prepare phase is that > > parent here will trigger a modeset to turn off the timing and > > subsequently the panel. > > the child devices should not turn of their clocks before parent unwinds > > the composition. Hence they are serialized as per the sequence mentioned > > above. > > So the general model in Linux is that children suspend before their > parents, right? So you're saying that, in this case, the parent needs > to act on the child before the child suspends. Is that correct? > > Rather than hijacking the prepare/complete, I'd be at least slightly > inclined to move the other driver to turn off its clocks in > suspend_late and to turn them back on in resume_early? That seems to > be what was done in "analogix_dp-rockchip.c" to solve a similar > problem. > > > > A similar approach is taken by other driver that use drm framework. In > > this driver, the device registers for prepare and complete callbacks to > > handle drm_suspend and drm_resume. > > https://gitlab.freedesktop.org/drm/msm/-/blob/msm-next/drivers/gpu/drm/exynos/exynos_drm_drv.c#L163 > > OK, if there is another driver in DRM then I guess I won't object too > strongly. Note that when searching for other drivers I noticed this > bit in todo.rst: > > * Most drivers (except i915 and nouveau) that use > * drm_atomic_helper_suspend/resume() can probably be converted to use > * drm_mode_config_helper_suspend/resume(). Also there's still open-coded version > * of the atomic suspend/resume code in older atomic modeset drivers. > > Does anything get fixed if you do that? It seems like it'd cleanup > your code a bit so maybe worth doing anyway... > > --- > > I guess the last question I'd want resolved is why you have this asymmetry: > > SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, msm_pm_resume) > > Why couldn't you use pm_runtime_force_resume()? I'm curious if you had answers to any of the questions I posed in my review. -Doug