Re: [RFC PATCH V4 1/4] media: v4l2-mem2mem: add v4l2_m2m_suspend, v4l2_m2m_resume

Ezequiel Garcia <ezequiel@xxxxxxxxxxxxxxxxxxxx> · Wed, 10 Jun 2020 16:14:30 -0300

On Wed, 10 Jun 2020 at 16:03, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
>
> On Wed, Jun 10, 2020 at 03:52:39PM -0300, Ezequiel Garcia wrote:
> > Hi everyone,
> >
> > Thanks for the patch.
> >
> > On Wed, 10 Jun 2020 at 07:33, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
> > >
> > > On Wed, Jun 10, 2020 at 12:29 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > > >
> > > > On 21/05/2020 19:11, Tomasz Figa wrote:
> > > > > Hi Jerry,
> > > > >
> > > > > On Wed, Dec 04, 2019 at 08:47:29PM +0800, Jerry-ch Chen wrote:
> > > > >> From: Pi-Hsun Shih <pihsun@xxxxxxxxxxxx>
> > > > >>
> > > > >> Add two functions that can be used to stop new jobs from being queued /
> > > > >> continue running queued job. This can be used while a driver using m2m
> > > > >> helper is going to suspend / wake up from resume, and can ensure that
> > > > >> there's no job running in suspend process.
> [snip]
> > > >
> > > > I assume this will be part of a future patch series that calls these new functions?
> > >
> > > The mtk-jpeg encoder series depends on this patch as well, so I guess
> > > it would go together with whichever is ready first.
> > >
> > > I would also envision someone changing the other existing drivers to
> > > use the helpers, as I'm pretty much sure some of them don't handle
> > > suspend/resume correctly.
> > >
> >
> > This indeed looks very good. If I understood the issue properly,
> > the change would be useful for both stateless (e.g. hantro, et al)
> > and stateful (e.g. coda) codecs.
> >
> > Hantro uses pm_runtime_force_suspend, and I believe that
> > could is enough for proper suspend/resume operation.
>
> Unfortunately, no. :(
>
> If the decoder is already decoding a frame, that would forcefully power
> off the hardware and possibly even cause a system lockup if we are
> unlucky to gate a clock in the middle of a bus transaction.
>

pm_runtime_force_suspend calls pm_runtime_disable, which
says:

"""
 Increment power.disable_depth for the device and if it was zero previously,
 cancel all pending runtime PM requests for the device and wait for all
 operations in progress to complete.
"""

Doesn't this mean it waits for the current job (if there is one) and
prevents any new jobs to be issued?

> I just inspected the code now and actually found one more bug in its
> power management handling. device_run() calls clk_bulk_enable() before
> pm_runtime_get_sync(), but only the latter is guaranteed to actually
> power on the relevant power domains, so we end up clocking unpowered
> hardware.
>

How about we just move clk_enable/disable to runtime PM?

Since we use autosuspend delay, it theoretically has
some impact, which is why I was refraining from doing so.

I can't decide if this impact would be marginal or significant.

> >
> > I'm not seeing any code in CODA to handle this, so not sure
> > how it's handling suspend/resume.
> >
> > Maybe we can have CODA as the first user, given it's a well-maintained
> > driver and should be fairly easy to test.
>
> I remember checking a number of drivers using the m2m helpers randomly
> and none of them implemented suspend/resume correctly. I suppose that
> was not discovered because normally the userspace itself would stop the
> operation before the system is suspended, although it's not an API
> guarantee.
>

Indeed. Do you have any recomendations for how we could
test this case to make sure we are handling it correctly?

> Best regards,
> Tomasz