Re: Proposed updates and guidelines for MPEG-2, H.264 and H.265 stateless support

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Wed, 22 May 2019 15:01:48 +0900

On Tue, May 21, 2019 at 8:45 PM Paul Kocialkowski
<paul.kocialkowski@xxxxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, 2019-05-21 at 19:27 +0900, Tomasz Figa wrote:
> > On Thu, May 16, 2019 at 2:43 AM Paul Kocialkowski
> > <paul.kocialkowski@xxxxxxxxxxx> wrote:
> > > Hi,
> > >
> > > Le mercredi 15 mai 2019 à 10:42 -0400, Nicolas Dufresne a écrit :
> > > > Le mercredi 15 mai 2019 à 12:09 +0200, Paul Kocialkowski a écrit :
> > > > > Hi,
> > > > >
> > > > > With the Rockchip stateless VPU driver in the works, we now have a
> > > > > better idea of what the situation is like on platforms other than
> > > > > Allwinner. This email shares my conclusions about the situation and how
> > > > > we should update the MPEG-2, H.264 and H.265 controls accordingly.
> > > > >
> > > > > - Per-slice decoding
> > > > >
> > > > > We've discussed this one already[0] and Hans has submitted a patch[1]
> > > > > to implement the required core bits. When we agree it looks good, we
> > > > > should lift the restriction that all slices must be concatenated and
> > > > > have them submitted as individual requests.
> > > > >
> > > > > One question is what to do about other controls. I feel like it would
> > > > > make sense to always pass all the required controls for decoding the
> > > > > slice, including the ones that don't change across slices. But there
> > > > > may be no particular advantage to this and only downsides. Not doing it
> > > > > and relying on the "control cache" can work, but we need to specify
> > > > > that only a single stream can be decoded per opened instance of the
> > > > > v4l2 device. This is the assumption we're going with for handling
> > > > > multi-slice anyway, so it shouldn't be an issue.
> > > >
> > > > My opinion on this is that the m2m instance is a state, and the driver
> > > > should be responsible of doing time-division multiplexing across
> > > > multiple m2m instance jobs. Doing the time-division multiplexing in
> > > > userspace would require some sort of daemon to work properly across
> > > > processes. I also think the kernel is better place for doing resource
> > > > access scheduling in general.
> > >
> > > I agree with that yes. We always have a single m2m context and specific
> > > controls per opened device so keeping cached values works out well.
> > >
> > > So maybe we shall explicitly require that the request with the first
> > > slice for a frame also contains the per-frame controls.
> > >
> >
> > Agreed.
> >
> > One more argument not to allow such multiplexing is that despite the

^^ Here I meant the "userspace multiplexing".

> > API being called "stateless", there is actually some state saved
> > between frames, e.g. the Rockchip decoder writes some intermediate
> > data to some local buffers which need to be given to the decoder to
> > decode the next frame. Actually, on Rockchip there is even a
> > requirement to keep the reference list entries in the same order
> > between frames.
>
> Well, what I'm suggesting is to have one stream per m2m context, but it
> should certainly be possible to have multiple m2m contexts (multiple
> userspace open calls) that decode different streams concurrently.
>
> Is that really going to be a problem for Rockchip? If so, then the
> driver should probably enforce allowing a single userspace open and m2m
> context at a time.

No, that's not what I meant. Obviously the driver can switch between
different sets of private buffers when scheduling different contexts,
as long as the userspace doesn't attempt to do any multiplexing
itself.

Best regards,
Tomasz