Le jeudi 10 février 2022 à 16:35 -0500, Nicolas Dufresne a écrit : > Thanks for the feedback, answers below. > > Le mercredi 09 février 2022 à 14:39 +0200, Stanimir Varbanov a écrit : > > Hi Nicolas, > > > > On 2/4/22 22:17, Nicolas Dufresne wrote: > > > Hi Stanimir, > > > > > > I know you were looking for some ways to pass back data post encoding per frame > > > in V4L2 a while ago, but I think I lost track. I *think* you were looking for a > > > way to pass back HDR10+ metadata from the decoder. I'm currently trying design a > > > > The size of HDR10plus metadata (taken from ffmpeg, ATSC 2094-40) is > > ~12KB. The other encoder metadata which I want also to have is encoder > > ROI which depends on the encoding resolution and my calculations shows > > up that for 8k resolution I need metadata buffer of 272KB (h264) and > > 68KB for hevc. With the numbers above I want to say that we need some > > scalable solution for input/output metadata. V4L2 controls are not such > > solution. My experiments with passing through v4l2 compound control raw > > data (16/32KB) shows that the copy to/from userspace is interrupted many > > times which impacts badly performance on higher framerates (>= 460 fps). > > Would still be copied, but we are dogfooding the dynamic array control support > for stateless HEVC and AV1 decoding, so you can pass just the data you need > rather then always copying the theoretical maximum. Would such mechanism get you > closer to your goal ? > > > > > > a way to pass back SVC encoded frame info (layer(s)_id and references). This itis > > > somewhat similar, for each frames being encoded I need some information from the > > > encoder, so I can pass it back to the RTP payloader. This issue was fixed in > > > AV1, but VP9 is still pretty important. > > > > > > On my side, I was thinking that a driver could stack the data per encoded buffer > > > internally, and update a control state at DQBUF(capture) time. This should not > > > be racy, as the next update will stay pending till the next DQBUF, but I'm > > > worried of the overhead and maybe complexity. > > > > What is the size of the data you want to pass from kernel to userspace? > > Its equivalent to the VP9 RTP SS structure. The size seems to be: > > https://datatracker.ietf.org/doc/html/draft-ietf-payload-vp9-16#section-4.2.1 > > 2+(2+2)×(N_S + 1)+(1+R)×N_G > > N_S: being the number of spatial layer - 1 (usually 3 and less) > R: The number of reference for 1 picture (max 3) > N_G: The size of a picture group (not sure what is the max) > > This should be relatively small (it must fit 1 UDP packet, so can't really get > that big), but it does have a slightly dynamic size. I will work more on this > when I draft the control(s) for it. I didn't even decided if it would be 1 > compount controls, or if I its worth splitting into 2 dynamic arrays. > > > > > Even that it is not racy on 60fps, on becoming actual 460fps and beyond > > it will be. That's why I think v4l2 controls is not an option for the > > I would be interested in your demonstration of the racyness of such mechanism > please. What was suggested is basically to have a fifo in the driver, and pop > from the fifo synchronously when userland calls DQBUF. If userland dequeue and > read the control in the same thread, I have a hard time to believe the framerate > will have any impact on that being racy or not. I could of course be missing the > obvious, it happens to me all the time. > > > near future. The only clean option (not adding additional complexities > > in client for synchronization data <-> metadata is to have a metadata > > buffer attached to data buffer. The other option is a separate video > > node for metadata but I'm not happy with this - this complicates driver > > and client implementations. And the third option is to change request > > API and v4l2 controls framework to deal with dmabuf instead of copy > > to/from user. > > I understand your "all in" request, I'm unfortunately unlikely to design for you > with a less then 1kb metadata. But I can give you my feedback based on my > current understanding of all these pieces. > > 1. Metadata attached to data buffer: > > Yes, you can do that for capture device with data_offset, though there is no > mechanism in place to actually signal type or even partition this data (in case > you have multiple piece of data). You'd need to figure-out a way to keep the > partition stable, otherwise you'll get back to the initial problem. > > In CODEC, when we output reference frames directly, there is often some extra > space at the end which holds some saved state, though in an ideal world, we'd > prevent this area from being map-able by application. But extra space at the end > is quite similar idea and perhaps equally valid. > > 2. Separate video nodes: > > I can't say yes to that one, unless you already have a plan. An m2m session is > created after opening 1 video node. I can't think of a way we could get a second > open() call on a second node to become part of the initial m2m session. Though, > if you do have ideas, I think this method is quite powerful. > > Having the ability to join multiple dev nodes into one m2m session means an m2m > devices are no longer limited to 1 stream. This would fix a huge limitation we > are facing with V4L2 m2m drivers. A lot of cloud oriented decoders seems to > include multi-scaler feature, and other platform have multi-scaler m2m type of > devices, without any fixed limitation on the number of instances. > > For your case, we could use the same mechanism as let's say UVC uses to pass > metadata. As you said, that would add a bit more work for userland, have to poll > multiple queues, but some metadata could be documented to be produced in pair > with the main queue, so at least there is no significant time-difference to be > expected, and userland can poll both and DQBUF from both in a row. > > I'd be very happy if any one (adding Hans in direct CC now) have an idea for the > main problem that would need to be solved, since this would solves a larger > issue. But yes, it looks like it complicates drivers (managing move VB2 queues), > managing topologies, perhaps some will start using subdev that would need to be > part of the sessions (imho this is starting to create too many devices to make > sense though). > > 3. DMABuf type of controls using Request > > Just keep in mind that Request are bound to OUTPUT buffers only. It is possible > with decoder to associate the CAPTURE buffer with the OUTPUT buffer (using the > timestamp) and that allow userland to associate with the original request. No > blocker, just reminding that this is not as trivial as it may look for userland. > > Request are basically a fancy FIFO with were you can also remove items from > random point. You can skip reading certain data, and also let you read them out > of order (as long as you haven't discarded the associated **OUTPUT** buffer, > just reminding, since the fact the data is bound to the bitstream buffer is far > from ideal, you endup being forced to allocate more bitstream buffers just to > hold on the metadata longer). > > I personally think that bitstream associated request are going to be pain for > CODEC metadata. Specially for stateful decoders, as it forces userland into > doing things that make little sense. This is exactly why I'm trying my luck here > proposing a simplifier mechanism, but that requires reading controls in order. Please ignore the part about output request. I missed this patch: [RFC PATCHv2 08/11] v4l2-mem2mem.c: allow requests for capture queues In fact, a queue is write-only request, or read-only. Typically read-only request are to be used on capture queue. The request could be allocated and queued with the capture buffer, that voids the complexity I just mention here. sorry for the noise. > > Now, if you can upgrade a control to hold a dmabuf (or memfd, I guess that > depends on the source for that metadata), this get an optimization over my > approach. There was some concern raise, I believe my Tomasz, regarding writable > controls that would use DMABuf. Few things need to be resolved: > > - Were do you allocate ? (dmabuf heap ?) > - Which type of memory should it contain for this driver, for this control ? > - Do we strictly import ? > - Is there a security concern for controls if the data can be edited after being > set by userland ? > > I think we can split the problem, for smaller (aka SVC) metadata, the DMABuf > optimization is probably not as big, but the workflow issue is entirely shared. > > Nicolas >