On Wed, 10 Jun 2020 at 16:26, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > > On Wed, Jun 10, 2020 at 9:14 PM Ezequiel Garcia > <ezequiel@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Wed, 10 Jun 2020 at 16:03, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > > > > > > On Wed, Jun 10, 2020 at 03:52:39PM -0300, Ezequiel Garcia wrote: > > > > Hi everyone, > > > > > > > > Thanks for the patch. > > > > > > > > On Wed, 10 Jun 2020 at 07:33, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > > > > > > > > > > On Wed, Jun 10, 2020 at 12:29 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote: > > > > > > > > > > > > On 21/05/2020 19:11, Tomasz Figa wrote: > > > > > > > Hi Jerry, > > > > > > > > > > > > > > On Wed, Dec 04, 2019 at 08:47:29PM +0800, Jerry-ch Chen wrote: > > > > > > >> From: Pi-Hsun Shih <pihsun@xxxxxxxxxxxx> > > > > > > >> > > > > > > >> Add two functions that can be used to stop new jobs from being queued / > > > > > > >> continue running queued job. This can be used while a driver using m2m > > > > > > >> helper is going to suspend / wake up from resume, and can ensure that > > > > > > >> there's no job running in suspend process. > > > [snip] > > > > > > > > > > > > I assume this will be part of a future patch series that calls these new functions? > > > > > > > > > > The mtk-jpeg encoder series depends on this patch as well, so I guess > > > > > it would go together with whichever is ready first. > > > > > > > > > > I would also envision someone changing the other existing drivers to > > > > > use the helpers, as I'm pretty much sure some of them don't handle > > > > > suspend/resume correctly. > > > > > > > > > > > > > This indeed looks very good. If I understood the issue properly, > > > > the change would be useful for both stateless (e.g. hantro, et al) > > > > and stateful (e.g. coda) codecs. > > > > > > > > Hantro uses pm_runtime_force_suspend, and I believe that > > > > could is enough for proper suspend/resume operation. > > > > > > Unfortunately, no. :( > > > > > > If the decoder is already decoding a frame, that would forcefully power > > > off the hardware and possibly even cause a system lockup if we are > > > unlucky to gate a clock in the middle of a bus transaction. > > > > > > > pm_runtime_force_suspend calls pm_runtime_disable, which > > says: > > > > """ > > Increment power.disable_depth for the device and if it was zero previously, > > cancel all pending runtime PM requests for the device and wait for all > > operations in progress to complete. > > """ > > > > Doesn't this mean it waits for the current job (if there is one) and > > prevents any new jobs to be issued? > > > > I'd love if the PM runtime subsystem handled job management of all the > driver subsystems automatically, but at the moment it's not aware of > any jobs. :) The description says as much as it says - it stops any > internal jobs of the PM subsystem - i.e. asynchronous suspend/resume > requests. It doesn't have any awareness of V4L2 M2M jobs. > Doh, of course. I saw "pending requests" and somehow imagined it would wait for the runtime_put. I see now that these patches are the way to go. > > > I just inspected the code now and actually found one more bug in its > > > power management handling. device_run() calls clk_bulk_enable() before > > > pm_runtime_get_sync(), but only the latter is guaranteed to actually > > > power on the relevant power domains, so we end up clocking unpowered > > > hardware. > > > > > > > How about we just move clk_enable/disable to runtime PM? > > > > Since we use autosuspend delay, it theoretically has > > some impact, which is why I was refraining from doing so. > > > > I can't decide if this impact would be marginal or significant. > > > > I'd also refrain from doing this. Clock gating corresponds to the > bigger part of the power savings from runtime power management, since > it stops the dynamic power consumption and only leaves the static > leakage. That said, the Hantro IP blocks have some internal clock > gating as well, so it might not be as pronounced, depending on the > custom vendor integration logic surrounding the Hantro hardware. > OK, I agree. We need to fix this issue then, changing the order of the calls. > Actually even if autosuspend is not used, the runtime PM subsystem has > some internal back-off mechanism based on measured power on and power > off latencies. The driver should call pm_runtime_get_sync() first and > then enable any necessary clocks. I can see that currently inside the > resume callback we have some hardware accesses. If those really need > to be there, they should be surrounded with appropriate clock enable > and clock disable calls. > Currently, it's only used by imx8mq, and it's enclosed by clk_bulk_prepare_enable/disable_unprepare. I am quite sure the prepare/unprepare usage is an oversight on our side, but it doesn't do any harm either. Moving the clock handling to hantro_runtime_resume is possible, but looks like just low-hanging fruit. > > > > > > > > I'm not seeing any code in CODA to handle this, so not sure > > > > how it's handling suspend/resume. > > > > > > > > Maybe we can have CODA as the first user, given it's a well-maintained > > > > driver and should be fairly easy to test. > > > > > > I remember checking a number of drivers using the m2m helpers randomly > > > and none of them implemented suspend/resume correctly. I suppose that > > > was not discovered because normally the userspace itself would stop the > > > operation before the system is suspended, although it's not an API > > > guarantee. > > > > > > > Indeed. Do you have any recomendations for how we could > > test this case to make sure we are handling it correctly? > > I'd say that a simple offscreen command line gstreamer/ffmpeg decode > with suspend/resume loop in another session should be able to trigger > some issues. > I can try to fix the above clk/pm issue and take this helper on the same series, if that's useful. Thanks, Ezequiel