Re: [RFC PATCH v2] media: Add AV1 uAPI

Daniel Almeida <daniel.almeida@xxxxxxxxxxxxx> · Tue, 31 May 2022 13:42:01 -0300

Sending this again, as apparently my last submission contained some HTML 
that prevented it from being sent on the media ML

---

Hi Steve,

> I think the below definition is expected to cause a build error.
>
> +struct v4l2_av1_loop_restoration {
> +       u8 flags;
>
> s/u8/__u8/ is needed.
>
> At least, this change was needed to fix this build error on Chromium
> build environment.

This will be fixed in RFC v3

> Question about tile info structure.
>
> struct v4l2_av1_tile_info {
> __u8 flags;
> __u32 mi_col_starts[V4L2_AV1_MAX_TILE_COLS + 1];
> __u32 mi_row_starts[V4L2_AV1_MAX_TILE_ROWS + 1];
> __u32 width_in_sbs_minus_1[V4L2_AV1_MAX_TILE_COLS];
> __u32 height_in_sbs_minus_1[V4L2_AV1_MAX_TILE_ROWS];
>
> I see below from the spec and gstreamer implementation
> for width_in_sbs_minus_1 and height_in_sbs_minus_1 computation.
>
>   sb_cols = seq_header->use_128x128_superblock ?
>       ((parser->state.mi_cols + 31) >> 5) : ((parser->state.mi_cols + 
15) >> 4);
>   sb_rows = seq_header->use_128x128_superblock ? 
((parser->state.mi_rows +
>           31) >> 5) : ((parser->state.mi_rows + 15) >> 4);
>
> Are we confident that V4L2_AV1_MAX_TILE_COLS is good enough size for
> width_in_sbs_minus_1?
> Or does it potentially need to be V4L2_AV1_MAX_TILE_COLS+1?
>
> I am asking to double check because I see V4L2_AV1_MAX_TILE_COLS+1
> used for corresponding field in libgav1.
> int tile_column_width_in_superblocks[kMaxTileColumns + 1];

I have checked with a few other APIs to be on the safe side.

In VA-API they  use a trick to save space on the last element, therefore 
these two arrays will only be 63 members wide.

In NVDEC, these two arrays are 64 members wide, which is the same as our 
V4L2 stateless implementation.

In DXVA, these two arrays are also 64 members wide

While going through the spec alongside with the libgav1 source code, I 
notice that the index used to index into the two arrays eventually gets 
assigned to tile_info->tile_rows and tile_info->tile_cols, i.e.

https://source.chromium.org/chromium/chromium/src/+/main:third_party/libgav1/src/src/obu_parser.cc;l=1704;drc=242da5037807dde3daf097ba74f875db83b8b613

https://source.chromium.org/chromium/chromium/src/+/main:third_party/libgav1/src/src/obu_parser.cc;drc=242da5037807dde3daf097ba74f875db83b8b613;l=1729

But the spec says that these variables must be less than or equal to 
MAX_TILE_ROWS (i.e. 64) and MAX_TILE_COLS (i.e. 64), respectively, i.e.:

> TileCols specifies the number of tiles across the frame. It is a 
requirement of bitstream conformance that TileCols is less
> than or equal to MAX_TILE_COLS.

> TileRows specifies the number of tiles down the frame. It is a 
requirement of bitstream conformance that TileRows is less
> than or equal to MAX_TILE_ROWS.

In which case only the first 64 members would be filled when actually 
submitting to the accelerator, i.e.:

https://source.chromium.org/chromium/chromium/src/+/main:media/gpu/windows/d3d11_av1_accelerator.cc;drc=242da5037807dde3daf097ba74f875db83b8b613;l=316

Given what I said above, I feel confident with the current implementation.

> struct v4l2_ctrl_av1_frame_header {
>   struct v4l2_av1_tile_info tile_info;
>   struct v4l2_av1_quantization quantization;
>   struct v4l2_av1_segmentation segmentation;
>   struct v4l2_av1_loop_filter loop_filter;
>   struct v4l2_av1_cdef cdef;
>   struct v4l2_av1_loop_restoration loop_restoration;
>   struct v4l2_av1_global_motion global_motion;
>
> We used "v4l2_ctrl_vp9_frame" for the similar purpose.
>
> I thought "_header" can be confusing in a sense that these are
> parameters setup from parsing av1 frame header,
> not necessarily "header" itself.
>
> How about making it "v4l2_ctrl_av1_frame" similar to vp9,
> or "v4l2_ctrl_av1_frame_params"?

Ok, expect this change on RFC v3.

> Don't we also need V4L2_PIX_FMT_AV1 in addition to V4L2_PIX_FMT_AV1_FRAME
> as we do with both VP8 and VP9? I see V4L2_PIX_FMT_AV1 is missing.

No, as the non "_FRAME" pixformats are used for the stateful interface.

> 1. "tab" seems to be used before "descr = ". […] for other cases.
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -1441,6 +1441,7 @@ static void v4l_fill_fmtdesc(struct 
v4l2_fmtdesc *fmt)
>                 case V4L2_PIX_FMT_MT21C:        descr = "Mediatek
> Compressed Format"; break;
> +               case V4L2_PIX_FMT_AV1_FRAME: descr = "AV1 Frame"; break;
>
> 2. nit: s/the  AOMedia/the AOMedia/
>
> This patch adds the  AOMedia Video 1 (AV1) kernel uAPI.
>
> 3. nit: s/AV1 film_gram/AV1 film grain/ ?
> +++ b/include/media/v4l2-ctrls.h
>
> + * @p_av1_film_grain:          Pointer to an AV1 film_grain.

Will be fixed in v3.

>
> #define V4L2_AV1_LOOP_FILTER_FLAG_DELTA_LF_PRESENT BIT(2)
>
> /**
>  * struct v4l2_av1_loop_filter - AV1 Loop filter params as defined in 
section
>  * 6.8.10. "Loop filter semantics" of the AV1 specification.
> ......
>
> struct v4l2_av1_loop_filter {
> ......
> __u8 delta_lf_res;
> __u8 delta_lf_multi;
> };
>
> - I think we should mention "6.8.16. Loop filter delta parameters 
semantics" in the comment too.
>

Ok

> - What was the reason "delta_lf_present" is defined with 
V4L2_AV1_LOOP_FILTER_FLAG_DELTA_LF_PRESENT
> instead of being inside of "v4l2_av1_loop_filter"?
> In other words, why do we want to treat it differently from 
delta_lf_res or delta_lf_multi?
> I am asking this question as this was confusing to me.

Usually we try to keep single-bit flags into a single "flags" field to 
save space. It is not a rule, but tends to get applied most of the time 
(by almost all codec APIs, not only V4L2 stateless)

I did fail to see that delta_lf_multi is only a single bit wide though, 
so by RFC v3 I will possibly have a flag for it as well.

> AV1 uAPI is using BIT() macro, which is probably from a kernel 
internal header <linux/bits.h>.
> Is this planned usage? We think we can't include it from userspace.
>
> Thank you Chen-Yu for sharing his thought on the issue.

Looking at the other codec APIs in V4L2 stateless, apparently the 
default is to declare the flag using a literal. I will convert the flags 
in AV1 to not use the BIT macro anymore.

> Question about update_ref_delta, update_mode_delta flags for loop 
filter params in the spec.
>
> I don't see these flags in v4l2_av1_loop_filter struct.
>
> After looking at gstreamer implementation, I think arrays ref_deltas, 
mode_deltas are only filled in when these flags are 1.
>
> Is this correct understanding?
> If not, can you explain the background why these flags are omitted?

Possibly forgotten. Will fix in RFC v3.

> I am not sure how to setup loop_restoration_size[0] in 
v4l2_av1_loop_restoration struct,
> which seems to use RESTORATION_TILESIZE_MAX = 256 at least for 
gstreamer implementation.
>
> Is this RESTORATION_TILESIZE_MAX something potentially needs to be 
added in the API by any chance?
> I do see this from the AV1 spec.

I usually only #define constants if they're used in the actual uAPI code 
somehow as opposed to in intermediary steps such as parsing. You can 
still define the spec constants as needed in userspace code. In this 
particular case, you can compute loop_restoration_size[0] by following 
the spec implementation, i.e.:

LoopRestorationSize[ 0 ] = RESTORATION_TILESIZE_MAX >> (2 - lr_unit_shift);

Where you can #define RESTORATION_TILESIZE_MAX 256 in your own userspace 
code without it having to be part of the uAPI. I don't believe that 
drivers will ever use that constant, but those that do may #define it on 
their own code.

> could you have the V4L2 CID stuff inserted consistently? In some 
places they are inserted before stateless HEVC / after VP8_FRAME, while 
in others they are after VP9_FRAME. I'd expect them all to be at the 
very end of the stateless block, after VP9_FRAME,
>

I am adding this feedback from Chen-Yu ^ as a sign that the issue it 
talks about will be fixed in RFC v3.

-- Daniel