On Fri, 26 Jul 2019 08:28:28 +0200 Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote: > On Thu, 25 Jul 2019 23:39:11 -0300 > Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx> wrote: > > > On Thu, 2019-07-25 at 21:36 +0200, Paul Kocialkowski wrote: > > > Hi, > > > > > > On Thu 25 Jul 19, 08:42, Boris Brezillon wrote: > > > > On Fri, 5 Jul 2019 19:16:18 +0200 > > > > Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> wrote: > > > > > > > > > On Fri, 05 Jul 2019 13:40:03 -0300 > > > > > Ezequiel Garcia <ezequiel@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > Hi Boris, Paul, > > > > > > > > > > > > On Wed, 2019-07-03 at 14:28 +0200, Boris Brezillon wrote: > > > > > > > Looks like some stateless decoders expect slices to be prefixed with > > > > > > > ANNEX B start codes (they most likely do some kind of bitstream parsing > > > > > > > and/or need that to delimit slices when doing per frame decoding). > > > > > > > Since skipping those start codes for dummy stateless decoders (those > > > > > > > expecting all params to be passed through controls) should be pretty > > > > > > > easy, let's mandate that all slices be prepended with ANNEX B start > > > > > > > codes. > > > > > > > > > > > > > > If we ever need to support AVC headers, we can add a new menu control > > > > > > > to select the type of NAL header to use. > > > > > > > > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> > > > > > > > Reviewed-by: Paul Kocialkowski <paul.kocialkowski@xxxxxxxxxxx> > > > > > > > --- > > > > > > > Changes in v3: > > > > > > > * Add Paul's R-b > > > > > > > > > > > > > > Changes in v2: > > > > > > > * None > > > > > > > --- > > > > > > > Documentation/media/uapi/v4l/ext-ctrls-codec.rst | 1 + > > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > > > diff --git a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > index 7a1947f5be96..3ae1367806cf 100644 > > > > > > > --- a/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > +++ b/Documentation/media/uapi/v4l/ext-ctrls-codec.rst > > > > > > > @@ -1726,6 +1726,7 @@ enum v4l2_mpeg_video_h264_hierarchical_coding_type - > > > > > > > :ref:`h264`, section 7.4.3 "Slice Header Semantics". For further > > > > > > > documentation, refer to the above specification, unless there is > > > > > > > an explicit comment stating otherwise. > > > > > > > + All slices should be prepended with an ANNEX B start code. > > > > > > > > > > > > > > > > > > > Currently, the H264 slice V4L2_PIX_FMT_H264_SLICE_RAW, > > > > > > is specified to _not_ contain the ANNEX B start code. > > > > > > > > > > Yep, we should provably rename the format. > > > > > > > > Paul, are you okay with this rename? > > > > > > Sorry for the very long response time here, I've had a hard time getting back > > > into the context of all this. > > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE/ > > > > > > > > or > > > > > > > > s/V4L2_PIX_FMT_H264_SLICE_RAW/V4L2_PIX_FMT_H264_SLICE_ANNEXB/ > > > > > > I'd be in favor of the former (V4L2_PIX_FMT_H264_SLICE) and passing offsets > > > to the beinning and after the start code. That would be more flexible, but one > > > downside could be decoders that some decoders only take a specific start code. > > > > > > On the other hand I don't think that having one pixel format for each type of > > > start code would be very reasonable, so I'd rather see an offset for now and > > > perhaps a menu control later if needed to specify which types of start codes are > > > supported. > > > > > > > If I am reading the spec correctly, Annex B start code is specified to always > > be the 3-byte start code: 0x000001. > > > > The first NAL of a frame may have an additional 0x00, which effectively means > > the start code of the first NAL of a frame _can_ be 4-byte 0x00000001, > > in addition to the 3-byte 0x000001. > > That's not my understanding of the Annex B section (quoting the spec > for reference): > > " > The byte stream format consists of a sequence of byte stream NAL unit > syntax structures. Each byte stream NAL unit syntax structure contains > one start code prefix followed by one nal_unit( NumBytesInNALunit ) > syntax structure. It may (and under some circumstances, it shall) also > contain an additional zero_byte syntax element. It may also contain one > or more additional trailing_zero_8bits syntax elements. When it is the > first byte stream NAL unit in the bitstream, it may also contain one or > more additional leading_zero_8bits syntax elements. > " > > To me, it looks like the start code can always be 0x000001 or > 0x00000001. The first NAL can have extra leading 0x00 bytes, not the > following ones, *but*, all NALs can have an arbitrary number of > trailing 0x00 bytes. I guess stateless decoders unconditionally apply > the "skip_leading_0_bytes logic" described in chapter B.1.1 of the spec > to deal with these potential trailing 0x00 bytes. > Stateful decoders (those parsing Annex B NAL headers) might skip this > "skip leading 0x00 bytes" step for NAL > 0, but I suspect they just > always skip leading 0x00 bytes because > - it's simple enough > - they anyway need the logic for the first NAL > - that would require extra information about the NAL number which > stateless decoder usually don't have > > > > > In other words, there aren't multiple Annex B type of start codes, and only > > two options for the format of the slice: NAL units with or without a start code. > > There's also the AVC NAL header, and I'm pretty sure you can't use the > "add leading 0x00 bytes" trick to align things as you wish with that > one. > > > > > Therefore, I can't see any point in having this offset. > > Assuming the Annex B start codes can contain an arbitrary number of > leading 0x00 bytes, we can align things according to the codec > expectations. But, as I said below, that implies exposing those > alignment constraints. > > > > > > > I'd also to discuss some concerns Ezequiel and I have regarding this > > > > change. Some (most?) codec have alignment constraints on the buffer > > > > they pass to the HW. For HW that support Annex B parsing, that's no > > > > problem because the start of the buffer will be aligned on a page (I'm > > > > assuming page alignment should cover 99% of the alignment constraints). > > > > But HW that need to skip the start code will have to pass a non-aligned > > > > buffer (annex B start code is 3 byte long). > > > > Paul looked at the Cedrus driver and thinks it can be handled correctly > > > > thanks to the VE_H264_VLD_OFFSET field (which encodes an offset in bit), > > > > but I fear this might be a problem on other HW. > > > > > > > > So, I'm asking again, are we sure we want to handle the raw (no start > > > > code) and annex-b cases using the same pixel format? If we do, what's > > > > the plan to address those potential alignment constraints? Should > > > > we provide a way for userspace to define where the start-code ends so it > > > > can align things properly (annex B can be extended with extra 00 > > > > bytes at the beginning)? If we do that, that means userspace has to > > > > know about those alignment constraints, or take something big enough. > > > > Another option would be to use a bounce buffer when things are not > > > > aligned properly. > > > > > > > > I'd really like to get feedback on those points before sending a v4. > > > > > > Mhh I don't really know what would be best for handling that. Either way, I > > > don't see how more pixel formats would really help solve the issue, so I'm still > > > in favor of one. > > > > > > Having a control that specifies an alignment constraint for the slice beginning > > > could work (as long as we make it optional, although userspace should be > > > required to abide by it when it is present). > > By making that, you put the burden on both sides of the stack: > > - the kernel side will have to deal with the unaligned cases (using a > bounce buffer) > - userspace apps/libs that want to avoid an extra copy will have to > check this constraint and align things properly anyway I'd like to revise my statement. Ideally, the drivers should take care of such mis-alignments or unsupported NAL header types by copying/re-formatting the OUTPUT buffer so that existing apps work out of the box when the driver is added, which means we'll have to take care of that kernel-side anyway. Handling selection of the best encoding-mode/NAL-header-type in userspace is useful if one wants to improve perfs. > > Plus, the alignment thing won't work for AVC headers, so I think we > should actually have a control to select the NAL header type rather > than expose some alignment constraints (or have one pix fmt per NAL > header type, but you don't seem to like the idea, so I'm trying > to find something else :-)). > > And if we go for this option (control to select the NAL header type), > I'm wondering why we're not making that NAL-header type selection > mandatory from the start. We don't have to support all NAL headers at > first (can be Annex B only), but, by making this control selection > non-optional, we'll at least give a decent feedback to userspace > (setting NAL header control fails because the selected NAL header type > is not supported by the HW) instead of returning an error on the > decoding operation (which, depending on how verbose the driver is, can > be quite hard to figure out). > > > > > > > I guess it's not such a high price to pay for a unified codec interface :) > > If by unified you mean exposing only one pixel format, then yes, it's > unified. Doesn't make it easier to deal with from the userspace > perspective IMHO. > > To sum-up, I'm fine keeping one pixel format, but I'm no longer sure > not exposing the NAL header type is a good option. We've seen that > providing alignment guarantees for HW expecting raw bitstream (without > the start code) might become challenging at some point. So I'd opt for > making this selection explicit. After all, it's just an extra control > to set from userspace, and 2 extra switch-case: one to select the most > appropriate NAL header type, and another one to fill the buffer with > the appropriate header (if there's one).