Re: [RFC] V4L HDR Architecture Proposal

VDRU VDRU <user.vdr@xxxxxxxxx> · Mon, 18 May 2020 15:16:47 -0700

Hi,

I was wondering what the current status is on this? Is there any plan
to also add Dolby Vision as well?

Thanks

On Wed, Feb 19, 2020 at 2:43 AM Hans Verkuil <hverkuil@xxxxxxxxx> wrote:
>
> On 2/12/20 5:48 PM, Laurent Pinchart wrote:
> > Hi Hans,
> >
> > On Thu, Jan 30, 2020 at 07:00:12AM +0000, Dylan Yip wrote:
> >> On Tuesday, January 28, 2020 11:31 PM, Hans Verkuil wrote:
> >>> On 1/29/20 7:14 AM, Dylan Yip wrote:
> >>>> On Friday, January 24, 2020 4:09 AM, Laurent Pinchart wrote:
> >>>>> On Fri, Jan 24, 2020 at 11:10:06AM +0100, Hans Verkuil wrote:
> >>>>>> On 1/24/20 10:04 AM, Vishal Sagar wrote:
> >>>>>>> On Thursday, January 23, 2020 6:36 PM, Hans Verkuil wrote:
> >>>>>>>> On 1/22/20 9:13 PM, Dylan Yip wrote:
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> We are planning to add HDR10 and HDR10+ metadata support into the
> >>>>>>>>> V4L framework and were hoping for some feedback before we started
> >>>>>>>>> implementation.
> >>>>>>>>
> >>>>>>>> Nice!
> >>>>>>>>
> >>>>>>>>> For context, Xilinx HDMI RX IP currently uses a AXI LITE
> >>>>>>>>> interface where HDR metadata is obtained from a hardware FIFO. To
> >>>>>>>>> access these packets a CPU copy is required.
> >>>>>>>>> We are in the process of migrating towards a AXI MM interface
> >>>>>>>>> where the hardware will directly write HDR metadata into memory.
> >>>>>>>>> Currently the HDMI RX driver (https://github.com/Xilinx/hdmi-
> >>>>>>>>> modules/blob/master/hdmi/xilinx-hdmirx.c) is modeled as a v4l
> >>>>>>>>> subdev. This is linked to a DMA IP which utilizes the DMA engine
> >>>>>>>>> APIs and registers itself as a video node for video data.
> >>>>>>>>>
> >>>>>>>>> HDR10 will only consist of static metadata which will come once
> >>>>>>>>> per stream.
> >>>>>>>>> However, HDR10+ will have dynamic metadata which can potentially
> >>>>>>>>> come once per frame and be up to ~4000 bytes. We would like V4L
> >>>>>>>>> architecture to be flexible to support both.
> >>>>>>>>
> >>>>>>>> The key here is the difference between Extended InfoFrames that
> >>>>>>>> can be long and the others, that have a maximum size. The latter
> >>>>>>>> should be handled by controls, the first is more difficult.
> >>>>>>>
> >>>>>>> Are you suggesting to handle static HDR via read only v4l controls
> >>>>>>> in a meta video node?
> >>>>>>
> >>>>>> Yes. It's very suitable for that purpose.
> >>>>
> >>>> So are you saying we should create a separate metadata node and add the
> >>>> v4l control there or would we add the v4l control to the existing video data
> >>>> node? If it is the former, what's the point of creating the metadata node
> >>>> since we won't qbuf/dqbuf to it?
> >>>
> >>> I'm sorry, I misread your original question. Static HDR should be handled via
> >>> read only v4l controls in the existing video node, not in a meta video node.
> >>
> >> Ah thanks for the clarification.
> >>
> >>>>>>>> Can you tell a bit more about how the hardware operates? Are all
> >>>>>>>> InfoFrames obtained through the hw fifo, or are some stored in
> >>>>>>>> registers and some go through the fifo?
> >>>>>>>
> >>>>>>> In the current implementation of the HDMI Rx IP, all InfoFrames are
> >>>>>>> read from a register byte by byte which has FIFO at the back.
> >>>>>>> The register is accessible by an AXI Lite interface.
> >>>>>>> The FIFO can store maximum 8 packets. Each packet is 36 bytes in
> >>>>>>> size (31 bytes data and 5 bytes ECC calculated by IP).
> >>>>>>> InfoFrames are one type of packets.
> >>>>>>
> >>>>>> Does one packet correspond to one InfoFrame? Or are they all
> >>>>>> concatenated and hacked up into packets for the FIFO?
> >>>>>>
> >>>>>> This probably won't work well for large Extended InfoFrames of 4kB
> >>>>>> or more: the driver would have to be able to read from the FIFO very
> >>>>>> quickly in order to prevent data from being lost, right? Hence the
> >>>>>> development of the AXIMM interface referred to below.
> >>>>>>
> >>>>>>> There are other types like General Control Packet, Audio Clock
> >>>>>>> Regeneration Packet, etc. referred in Table 5-8 packet types in
> >>>>>>> HDMI specification v1.4b)
> >>>>>>>
> >>>>>>> In future we plan on adding an AXIMM interface in the IP to handle
> >>>>>>> Dynamic HDR. The tentative behavior will be as below - The driver
> >>>>>>> will provide a buffer pointer to the IP via a register. The IP will dump
> >>>>>>> the infoframes's extracted data into this buffer.
> >>>>>>> With Frame sync, IP will return the length of the buffer in the
> >>>>>>> provided buffer.
> >>>>>>>
> >>>>>>>> Does the hardware set maximum sizes for specific InfoFrames or the
> >>>>>>>> total size of all InfoFrames combined? Or can it be any size?
> >>>>>>>>
> >>>>>>> Hope the above info about FIFO depth for current HDMI Rx IP answers
> >>>>>>> this.
> >>>>>>
> >>>>>> Right, so the driver will provide the maximum size for all
> >>>>>> InfoFrames that can occur between two video frames.
> >>>>>>
> >>>>>> And the driver will parse the received InfoFrames.
> >>>>>>
> >>>>>> I am strongly leaning towards using a control for the HDR10+
> >>>>>> InfoFrame as well: it fits well with the Request API where controls
> >>>>>> can be cleanly associated with a specific video frame, and the
> >>>>>> amount of data isn't that large.
> >>>>>
> >>>>> This however leads me to a simple question: why do we have a metadata
> >>>>> API in the first place if everything should go through controls ?
> >>
> >> I have the same concern as Laurent here. Why are we supporting HDR
> >> metadata through controls but using the metadata API for other types
> >> of metadata? Wouldn't it be cleaner to follow the existing metadata
> >> API since HDR is a type of metadata?
>
> There is very little need to expose InfoFrames to userspace: most of the
> current InfoFrame data is consumed by the video capture driver. That's
> how it is done in all existing HDMI video receivers. Any relevant data
> for userspace such as colorspace information is exposed via V4L2 data
> structures.
>
> There are only a few InfoFrames whose data needs to be exposed to userspace:
> the SPD InfoFrame (not implemented today in a driver, but perfectly suited
> for controls since this rarely changes), and the HDR metadata that you need.
>
> Also, all InfoFrame hardware implementations that I have seen expose the
> InfoFrame data through registers (often over an i2c bus), and that's
> really not suitable for the metadata API.
>
> Now, the Extended InfoFrame information is different beast, and there
> the metadata API is a reasonable approach. Not the only approach, perhaps,
> but certainly something to consider.
>
> So supposed we support this as a metadata device node, then it is
> an option to just dump all InfoFrames into the buffer. I.e., the pixel
> format used to define the format of the stream would say that it consists
> of all InfoFrames preceding the video frame concatenated together, and
> you have to parse it in userspace to find the InfoFrame types that you
> need. It is up to the driver to decide if all InfoFrames or only a subset
> are copied into the buffer.
>
> But be aware that the video driver always has to parse the data as well
> in order to extract the AVI InfoFrame and a few others in order to
> program the video receiver.
>
> The only reason IMHO to dump all InfoFrames into the buffer is if that's
> how the DMA engine operates. If you need to manually do this, then only
> copy the Extended InfoFrames.
>
> >>
> >> This is why we were originally thinking that a 2 node approach (1 for
> >> video 1 for metadata) would have been cleaner.
> >
> > I would also like to add that the implementation of the request API is
> > currently not able to support inline devices (with a live source). This
>
> For traditional non-MC drivers this is supported in the request API. Only
> for MC drivers is this a problem. I don't know if this upcoming driver is
> going the MC route.
>
> > could be changed, but would require a very significant amount of work if
> > we want to implement it properly. As we already have a metadata capture
> > API, I really don't see why this should go through the control
> > framework.
> >
> > Regarding HDR10 static metadata, while I agree that controls should work
> > for this purpose, I think we also need to take into account API
> > consistency. If we use the metadata API for dynamic metadata, we will
> > end up with two different APIs to cary two different types of metadata.
> > Is this desirable, or, given that we already plan for HDR10+ support,
> > would implementing HDR10 support through the metadata API be a better
> > solution for userspace ?
>
> The Dynamic Range and Mastering InfoFrame also needs to be processed by
> the driver, so it makes much more sense to have the driver expose the
> relevant data through one or more controls.
>
> That does not prevent you from dumping this in the metadata buffer as
> well (as described above), but you really only want to do that if that's
> the most efficient way for the DMA engine. But if the buffer is non
> consistent memory, then you need sync the buffer to the cpu before the
> driver can read it.
>
> So, in conclusion, I think it is OK to use a metadata device for the
> Extended InfoFrames. I don't think it is a good idea to dump *all*
> InfoFrames in the metadata buffer, especially if the driver has to
> sync the buffer to the cpu in order to obtain the non-Extended-InfoFrame
> information.
>
> >
> >>>>>> That said, some work in the control framework is probably needed to
> >>>>>> streamline things a bit:
> >>>>>>
> >>>>>> 1) it should be possible to increase the size of compound controls
> >>>>>> later if new fields are added. This is on the TODO list already since
> >>>>>> it is desired functionality for codecs.
> >>>>>>
> >>>>>> 2) tentative, needs research first: add some sort of mechanism to
> >>>>>> mmap the control payload to avoid mem copies. That would make controls
> >>>>>> much more useful for large metadata.
> >>>>>
> >>>>> Let's not forget that we would then also need to mmap the control per
> >>>>> request, which will become challenging if we want to be able to
> >>>>> pre-map everything like we do for buffers instead of mapping and
> >>>>> unmapping for every request.
> >>
> >> Same concern here. If we want to pre-map everything like buffers,
> >> wouldn't we essentially be replicating the behavior of buffers. Then
> >> the only difference would be that we are doing g_ctrl instead of
> >> qbuf/dqbuf right?
>
> Right. But qbuf/dqbuf has quite a bit of overhead as well (dealing with
> multiple streaming modes, cache management), and while that is perfectly
> acceptable when dealing with large buffers and DMA engines, I wonder if
> a mmap-ed control for mid-sized data (say up to 32/64 kBytes) isn't more
> efficient.
>
> That said, this functionality does not exist, and it would need some
> performance testing as well comparing the two. It's just a gut feeling
> without any hard proof.
>
> Regards,
>
>         Hans
>
> >
> > I couldn't have said that better :-)
> >
> >>>>>> I'm not sure when I will have time to work on that, though.
> >>>>>>
> >>>>>>>> Does it accept any InfoFrame or only specific InfoFrame types? Or
> >>>>>>>> is this programmable?
> >>>>>>>
> >>>>>>> HDMI Rx IP accepts all types of InfoFrames.
> >>>>>>>
> >>>>>>>>> We have 2 different proposals that we believe will work:
> >>>>>>>>>
> >>>>>>>>> A. 2 video node approach (1 for video, 1 for metadata) - This will
> >>>>>>>>>    align with current v4l metadata structure (i.e. uvc) but
> >>>>>>>>>    will require our HDMI RX driver to register a subdev and
> >>>>>>>>>    device node
> >>>>>>>>>       a. Our HDMI RX driver will register a v4l subdev (for
> >>>>>>>>>          video data) and a metadata node
> >>>>>>>>>               i. Is this acceptable?
> >>>>>>>>>       b. Applications will qbuf/dqbuf to both video and metadata nodes
> >>>>>>>>>          for each frame
> >>>>>>>>>
> >>>>>>>>> B. 1 video node approach - This will avoid mixing v4l subdev and
> >>>>>>>>>    v4l device node functionality inside HDMI RX driver but it
> >>>>>>>>>    strays from current v4l metadata architecture and also
> >>>>>>>>>    changes v4l subdev functionality
> >>>>>>>>>       a. We would add a "read" function to v4l subdev's
> >>>>>>>>>               i. This will also require us to add some "capabilities" field
> >>>>>>>>>                  to subdev or be able to query for the "read" function
> >>>>>>>>>       b. HDMI Rx driver will register a v4l subdev with "read"
> >>>>>>>>>          function/capability
> >>>>>>>>>       c. Application can directly pass a buffer in the "read" function
> >>>>>>>>>          to HDMI RX subdev to obtain HDR metadata
> >>>>>>>>>               i. We will need to pass subdev name from application or be able
> >>>>>>>>>                  to query all subdevs for this "read"
> >>>>>>>>>                  capability, is this acceptable?
> >>>>>>>>>
> >>>>>>>>> Please let me know your opinions on which approach is best or
> >>>>>>>>> propose another approach if these 2 are unfit. Thanks
> >
>