Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit : > Hi all, > > I hope I Cc-ed everyone with a stake in this issue. > > One recurring question is how a stateful encoder fills buffers and how a stateful > decoder consumes buffers. > > The most generic case is that an encoder produces a bitstream and just fills each > CAPTURE buffer to the brim before continuing with the next buffer. > > I don't think there are drivers that do this, I believe that all drivers just > output a single compressed frame. For interlaced formats I understand it is either > one compressed field per buffer, or two compressed fields per buffer (this is > what I heard, I don't know if this is true). > > In any case, I don't think this is specified anywhere. Please correct me if I am > wrong. > > The latest stateful codec spec is here: > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html > > Assuming what I described above is indeed the case, then I think this should > be documented. I don't know enough if a flag is needed somewhere to describe > the behavior for interlaced formats, or can we leave this open and have userspace > detect this? > > For decoders it is more complicated. The stateful decoder spec is written with > the assumption that userspace can just fill each OUTPUT buffer to the brim with > the compressed bitstream. I.e., no need to split at frame or other boundaries. > > See section 4.5.1.7 in the spec. > > But I understand that various HW decoders *do* have limitations. I would really > like to know about those, since that needs to be exposed to userspace somehow. So in "4.5.1.7. Decoding", there is a bit of confusion. The text speaks about ordered of frames in capture and output, but the bullet points stays that output buffers aren't frames. The following note about timestamps creates more confusion, since it says there is potentially, it's not very affirmative, timestamp matching that let you detect re- ordering done by the driver, but no clarification on how the timestamp are to be handle if the packing is random. What seems entirely missing in what we discussed, is a per format clarification for the behaviour of codec. I was assuming the NAL alignment to be documented for H264 and HEVC format. It make sense to allow some more flexibility since these formats are bytestream with startcodes, but to be, full-frame behaviour is what existing userspace expects and we should make this the defacto default. And if the buffer size ends up too small (badly predicted), I believe we should use the source change event to allow handling that. That being said, we have been able to survive this for a long time. For VP8 and VP9, which don't really have a bytestream format, I do assume it's logical to enforce full frames always. But if not, special care is needed to ensure the driver can reconstruct the full frames, since a firmware won't be able to parse the frame boundaries. Now, when I saw you taking over, I thought it was clear that this was only the common bits of the spec and that a per format specification would be developed later. > Specifically, the venus decoder needs to know the resolution of the coded video > beforehand and it expects a single frame per buffer (how does that work for > interlaced formats?). If the firmware works in a 1:1 behaviour, with H264 you may have two AU to compose 1 frame in interlaced stream (and that may change for each frame). In HEVC you'd always have two AU. > > Such requirements mean that some userspace parsing is still required, so these > decoders are not completely stateful. There was a discussion about the meaning of the stateful/stateless. This is not strictly related to parsing, the amount of parsing being affected is a side effect. The stateful decoder HW (or firmware) offer an interface with streams. It hides the state of the decoded stream. As a side effect, the HW can only be multiplexed if the firmware handles that. On the other end, stateless decoder offer an API where you configure the decoding of a frame (and sometimes a slice). Two consecutive frames do not have to be part of the same stream, which has the side effect of allowing application to handle their own multiplexing. > > Can every codec author give information about their decoder/encoder? > > I'll start off with my virtual codec driver: > > vicodec: the decoder fully parses the bitstream. The encoder produces a single > compressed frame per buffer. This driver doesn't yet support interlaced formats, > but when that is added it will encode one field per buffer. I just wanted to highlight that there is lot of behaviour specific to the formats here. Specially this last one, since it implies that capture format will be field = ALTERNATE for interlace decoding (this is a relatively rare format). So the behaviour here can already be inferred by the capture format (appart that interlace mode cannot be enumerated, so for encoding, it's a bit of a pain to guess). And there is already in the spec the information needed to match the pairs (or detect lost field). > > Let's see what the results are. > > Regards, > > Hans
Attachment:
signature.asc
Description: This is a digitally signed message part