Re: [RFC] Stateful codecs and requirements for compressed formats

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Thu, 11 Jul 2019 21:47:14 +0900

On Thu, Jul 11, 2019 at 10:42 AM Nicolas Dufresne <nicolas@xxxxxxxxxxxx> wrote:
>
> Le mercredi 03 juillet 2019 à 17:32 +0900, Tomasz Figa a écrit :
> > Hi Hans,
> >
> > On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
> > > Hi all,
> > >
> > > I hope I Cc-ed everyone with a stake in this issue.
> > >
> > > One recurring question is how a stateful encoder fills buffers and how a stateful
> > > decoder consumes buffers.
> > >
> > > The most generic case is that an encoder produces a bitstream and just fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > >
> > > I don't think there are drivers that do this, I believe that all drivers just
> > > output a single compressed frame. For interlaced formats I understand it is either
> > > one compressed field per buffer, or two compressed fields per buffer (this is
> > > what I heard, I don't know if this is true).
> > >
> > > In any case, I don't think this is specified anywhere. Please correct me if I am
> > > wrong.
> > >
> > > The latest stateful codec spec is here:
> > >
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > >
> > > Assuming what I described above is indeed the case, then I think this should
> > > be documented. I don't know enough if a flag is needed somewhere to describe
> > > the behavior for interlaced formats, or can we leave this open and have userspace
> > > detect this?
> > >
> >
> > From Chromium perspective, we don't have any use case for encoding
> > interlaced contents, so we'll be okay with whatever the interested
> > parties decide on. :)
> >
> > > For decoders it is more complicated. The stateful decoder spec is written with
> > > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > >
> > > See section 4.5.1.7 in the spec.
> > >
> > > But I understand that various HW decoders *do* have limitations. I would really
> > > like to know about those, since that needs to be exposed to userspace somehow.
> >
> > AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> > separate buffers. I believe it also needs 1 buffer to contain exactly
> > 1 frame and 1 frame to be fully contained inside 1 buffer.
> >
> > Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> > be fully contained inside 1 buffer. It used to have some specific
> > requirements regarding SPS and PPS too, but I think that was fixed in
> > the firmware.
> >
> > > Specifically, the venus decoder needs to know the resolution of the coded video
> > > beforehand
> >
> > I don't think that's true for venus. It does parsing and can detect
> > the resolution.
> >
> > However that's probably the case for coda...
> >
> > > and it expects a single frame per buffer (how does that work for
> > > interlaced formats?).
> > >
> > > Such requirements mean that some userspace parsing is still required, so these
> > > decoders are not completely stateful.
> > >
> > > Can every codec author give information about their decoder/encoder?
> > >
> > > I'll start off with my virtual codec driver:
> > >
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > > but when that is added it will encode one field per buffer.
> > >
> > > Let's see what the results are.
> >
> > s5p-mfc:
> >  decoder: fully parses the bitstream,
> >  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> >
> > mtk-vcodec:
> >  decoder: expects separate buffers for SPS, PPS and full frames
> > (including some random stuff like SEIMessage),
> >  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
>
> Interesting, do I read correctly that what the encoder does not produce
> what the decoder needs ?

Apparently. :)

But given all the diversity that was mentioned in this thread, one
can't expect to be able to feed a decoder with the exact buffers from
an encoder, although first of all I'm not sure why one would even want
to do that.

Best regards,
Tomasz