Re: [RFC] Stateful codecs and requirements for compressed formats

Hans Verkuil <hverkuil@xxxxxxxxx> · Wed, 10 Jul 2019 10:43:17 +0200

On 6/28/19 8:09 PM, Nicolas Dufresne wrote:
> Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
>> Hi all,
>>
>> I hope I Cc-ed everyone with a stake in this issue.
>>
>> One recurring question is how a stateful encoder fills buffers and how a stateful
>> decoder consumes buffers.
>>
>> The most generic case is that an encoder produces a bitstream and just fills each
>> CAPTURE buffer to the brim before continuing with the next buffer.
>>
>> I don't think there are drivers that do this, I believe that all drivers just
>> output a single compressed frame. For interlaced formats I understand it is either
>> one compressed field per buffer, or two compressed fields per buffer (this is
>> what I heard, I don't know if this is true).
>>
>> In any case, I don't think this is specified anywhere. Please correct me if I am
>> wrong.
>>
>> The latest stateful codec spec is here:
>>
>> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
>>
>> Assuming what I described above is indeed the case, then I think this should
>> be documented. I don't know enough if a flag is needed somewhere to describe
>> the behavior for interlaced formats, or can we leave this open and have userspace
>> detect this?
>>
>>
>> For decoders it is more complicated. The stateful decoder spec is written with
>> the assumption that userspace can just fill each OUTPUT buffer to the brim with
>> the compressed bitstream. I.e., no need to split at frame or other boundaries.
>>
>> See section 4.5.1.7 in the spec.
>>
>> But I understand that various HW decoders *do* have limitations. I would really
>> like to know about those, since that needs to be exposed to userspace somehow.
>>
>> Specifically, the venus decoder needs to know the resolution of the coded video
>> beforehand and it expects a single frame per buffer (how does that work for
>> interlaced formats?).
>>
>> Such requirements mean that some userspace parsing is still required, so these
>> decoders are not completely stateful.
>>
>> Can every codec author give information about their decoder/encoder?
>>
>> I'll start off with my virtual codec driver:
>>
>> vicodec: the decoder fully parses the bitstream. The encoder produces a single
>> compressed frame per buffer. This driver doesn't yet support interlaced formats,
>> but when that is added it will encode one field per buffer.
>>
>> Let's see what the results are.
> 
> Hans though a summary of what existing userspace expects / assumes
> would be nice.
> 
> GStreamer:
> ==========
> Encodes:
>   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> Decodes:
>   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
> 
> It assumes that each encoded v4l2_buffer contains exactly one frame
> (any format, two fields for interlaced content). It may still work
> otherwise, but some issues will appear, timestamp shift, lost of
> metadata (e.g. timecode, cc, etc.).

When you say 'each encoded v4l2_buffer contains exactly on frame',
does that include H.264 SPS/PPS headers? Or are those passed in
a separate v4l2_buffer? Ditto for FFMPEG.

Regards,

	Hans

> 
> FFMpeg:
> =======
> Encodes:
>   h263, h264, hevc, mpeg4, vp8
> Decodes:
>   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
> 
> Similarly to GStreamer, it assumes that one AVPacket will fit one
> v4l2_buffer. On the encoding side, it seems less of a problem, but they
> don't fully implement the FFMPEG CODEC API for frame matching, which I
> suspect would create some ambiguity if it was.
> 
> Chromium:
> =========
> Decodes:
>   H264, VP8, VP9
> Encodes:
>   H264
> 
> That is the code I know the less, but the encoder does not seem
> affected by the nal alignment. The keyframe flag and timestamps seems
> to be used and are likely expected to correlate with the input, so I
> suspect that there exist some possible ambiguity if the output is not
> full frame. For the decoder, I'll have to ask someone else to comment,
> the code is hard to follow and I could not get to the place where
> output buffers are filled. I thought the GStreamer code was tough, but
> this is quite similarly a mess.
> 
> Nicolas
> 
> 
> 
> 
> 
>