Le mercredi 15 janvier 2025 à 16:03 +0100, Paul Kocialkowski a écrit : > Would be glad to not have to work on the GStreamer side and focus on kernel > work instead. Sofar we can already aim to support: > - Hantro H1 > - Hantro H2/VC8000E > - Allwinner Video Engine And Rockchip VEPUs, which have Open Source software implementation in libMPP. Most of have access to reference software for the Hantro variants, I suppose you have revered the Allwinner ? p.s. there is also Imagination stateless codecs, but I only seen them on older TI board. > > > If you'd like to take a bite, this is a good thread to discuss forward. Until > > the summer, I planned to reach to Paul, who made this great presentation [1] at > > FOSDEM last year and start moving the RFC into using these ideas. One of the > > biggest discussion is rate control, it is clear to me that modern HW integrated > > RC offloading, though some HW specific knobs or even firmware offloading, and > > this is what Paul has been putting some thought into. > > In terms of RC offloading, what's I've seen in the Hantro H1 is a checkpoint > mechanism that allows making per-slice QP adjustments around the global picture > QP to bit the bill in terms of size. This can be a desirable thing if the use > case is to stick to a given bitrate strictly. > > There's also the regions of interest that are supported by many (most?) encoders > and allow region-based QP changes (typically as offset). The number of available > slots is hardware-specific. Checkpoints seems unique Hantro, it has a lot of limitation as it 8 a raster set of blocks. It won't perform well with a important object in the middle of the scene. > > In addition the H1 provides some extra statistics such as the "average" > resulting QP when on of these methods is used. Wasn't the statistic MAD (mean average distance), which is basically the average residual values ? In my copy of VC8000E reference someone, all that has been commented out, and the x265 implementation copied over (remember you can pay to use their code in proprietary form, before jumping onto license violation). > > I guess my initial point about rate control was that it would be easier for > userspace to be able to choose a rate-control strategy directly and to have > common implementations kernel-side that would apply to all codecs. It also > allows leveraging hardware features without userspace knowing about them. > > However the main drawback is that there will always be a need for a more > specific/advanced use-case than what the kernel is doing (e.g. using a npu), > which would need userspace to have more control over the encoder. Which brings to the most modern form of advanced rate control. You will find this in DXVA and Vulkan Video. It consist of splitting the image as an even grid, and allowing delta or qualitative differences of QP for each of the element in the grid. The size of that grid is limited by HW, you can implement ROI on top of this too. Though, if the HW has ROI directly, we don't have much option but to expose it as such, which is fine. A lot of stateful encoder have that too, and the controls should be the same. > > So a more direct interface would be required to let userspace do rate-control. > At the end of the day, I think it would make more sense to expose these encoders > for what they are and deal with the QP and features directly through the uAPI > and avoid any kernel-side rate-control. Hardware-specific features that need to > be configured and may return stats would just have extra controls for those. > > So all in all we'd need a few new controls to configure the encode for codecs > (starting with h.264) and also some to provide encode stats (e.g. requested qp, > average qp). It feels like we could benefit from existing stateful encoder > controls for various bitstream parameters. Sounds like we should offer both. As I stated earlier, modern HW resort to firmware offloading for performance reason. In V4L2, this is even more true. If you read statistics such as MAD, bitstream size in a frame by frame basis, then you will never queue more then 1 buffer on the capture side. So the programming latency (including RC latency) will directly impact the encoder throughput. With offloading, the statistic can be handled in firmware, or without any context switch, which improve throughput. This needs to be unbiased, the GStreamer implementation we did for the last RFC runs frame by frame, using last frame size as the statistic. We still managed the specified IP performance documented in the white paper. Like everything else, we don't need all this in a first uAPI, but we need to define the minimum "required" features. > > Then userspace would be responsible for configuring each encode run with a > target QP value, picture type and list of references. We'd need to also inform > userspace of how many references are supported. The H1 only have 1 reference + 1 long term reference (which only 1 reference was implemented). We used the default reference model, so there was only one way to manage and pass reference. There is clearly a lot more research to be done around reference management. Nicolas