Re: per-frame camera metadata (again)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sakari,

On Sat, 19 Dec 2015, Sakari Ailus wrote:

> Hi Guennadi and Hans,
> 
> Hans Verkuil wrote:
> > On 12/16/15 10:37, Guennadi Liakhovetski wrote:
> > > Hi all,
> > > 
> > > A project, I am currently working on, requires acquiringing per-frame
> > > metadata from the camera and passing it to user-space. This is not the
> > > first time this comes up and I know such discussions have been held
> > > before. A typical user is Android (also my case), where you have to
> > > provide parameter values, that have been used to capture a specific frame,
> > > to the user. I know Hans is working to handle one side of this process -
> > > sending per-request controls,
> > 
> > Actually, the request framework can do both sides of the equation: giving
> > back meta data in read-only controls that are per-frame. While ideally the
> > driver would extract the information from the binary blob and put it in
> > nice controls, it is also possible to make a control that just contains the
> > binary blob itself. Whether that's a good approach depends on many factors
> > and that's another topic.
> 
> I think that could be possible in some cases. If you don't have a lot of
> metadata, then, sure.
> 
> > > but I'm not aware whether he or anyone else
> > > is actively working on this already or is planning to do so in the near
> > > future? I also know, that several proprietary solutions have been
> > > developed and are in use in various projects.
> > > 
> > > I think a general agreement has been, that such data has to be passed via
> > > a buffer queue. But there are a few possibilities there too. Below are
> > > some:
> > > 
> > > 1. Multiplanar. A separate plane is dedicated to metadata. Pros: (a)
> > > metadata is already associated to specific frames, which they correspond
> > > to. Cons: (a) a correct implementation would specify image plane fourcc
> > > separately from any metadata plane format description, but we currently
> > > don't support per-plane format specification.
> > 
> > This only makes sense if the data actually comes in via DMA and if it is
> > large enough to make it worth the effort of implementing this. As you say,
> > it will require figuring out how to do per-frame fourcc.
> > 
> > It also only makes sense if the metadata comes in at the same time as the
> > frame.
> 
> I agree. Much of the time the metadata indeed arrives earlier than the
> rest of the frame. The frame layout nor the use cases should be assumed
> in the bridge (ISP) driver which implements the interface, essentially
> forcing this on the user. This is a major drawback in the approach.
> 
> Albeit. If you combine this with the need to pass buffer data to the user
> before the entire buffer is ready, i.e. a plane is ready, you could get around
> this quite neatly.
> 
> However, if the DMA engine writing the metadata is different than what's
> writing the image data to memory, then you have a plain metadata buffer --- as
> it's a different video node. But there's really nothing special about that
> then.
> 
> Conceptually we should support multi-part frames rather than metadata, albeit
> metadata is just a single use case where a single DMA engine outputs multiple
> kind of data. This could be statistics as well. Or multiple images, e.g. YUV
> and RAW format images of the same frame.

If you stream different kinds of images (raw, yuv), then using multiple 
nodes is rather straight-forward, isn't it? Whereas for statistics and 
metadata, if we do that, do we assign new FOURCC codes for each new such 
data layout?

> 
> With CSI-2, as the virtual channels are independent, one could start and stop
> them at different times and the frame rate in those channels could as well be
> unrelated. This suggests that different virtual channels should be
> conceptually separate streams also in V4L2 and thus the data from different
> streams should not end up to the same buffer.
> 
> Metadata usually (or practically ever?) does not arrive on a separate virtual
> channel though. So this isn't something that necessarily is taken into account
> right now but it's good to be aware of it.

A camera can send image data and metadata on the same virtual channel, but 
then it should use different data types for them?

> > > 2. Separate buffer queues. Pros: (a) no need to extend multiplanar buffer
> > > implementation. Cons: (a) more difficult synchronisation with image
> > > frames, (b) still need to work out a way to specify the metadata version.
> 
> Do you think you have different versions of metadata from a sensor, for
> instance? Based on what I've seen these tend to be sensor specific, or
> SMIA which defines a metadata type for each bit depth for compliant sensors.
> 
> Each metadata format should have a 4cc code, SMIA bit depth specific or
> sensor specific where metadata is sensor specific.
> 
> Other kind of metadata than what you get from sensors is not covered by the
> thoughts above.
> 
> <URL:http://www.retiisi.org.uk/v4l2/foil/v4l2-multi-format.pdf>
> 
> I think I'd still favour separate buffer queues.

And separate video nodes then.

> > > Any further options? Of the above my choice would go with (1) but with a
> > > dedicated metadata plane in struct vb2_buffer.
> > 
> > 3. Use the request framework and return the metadata as control(s). Since
> > controls
> > can be associated with events when they change you can subscribe to such
> > events.
> > Note: currently I haven't implemented such events for request controls since
> > I am
> > not certainly how it would be used, but this would be a good test case.
> > 
> > Pros: (a) no need to extend multiplanar buffer implementation, (b) syncing
> > up
> > with the image frames should be easy (both use the same request ID), (c) a
> > lot
> > of freedom on how to export the metadata. Cons: (a) request framework is
> > still
> > work in progress (currently worked on by Laurent), (b) probably too slow for
> > really large amounts of metadata, you'll need proper DMA handling for that
> > in
> > which case I would go for 2.
> 
> Agreed. You could consider it as a drawback that the number of new controls
> required for this could be large as well, but then already for other reasons
> the best implementation would rather be the second option mentioned.

But wouldn't a single extended control with all metadata-transferred 
controls solve the performance issue?

> > > In either of the above options we also need a way to tell the user what is
> > > in the metadata buffer, its format. We could create new FOURCC codes for
> > > them, perhaps as V4L2_META_FMT_... or the user space could identify the
> > > metadata format based on the camera model and an opaque type (metadata
> > > version code) value. Since metadata formats seem to be extremely camera-
> > > specific, I'd go with the latter option.
> 
> I think I'd use separate 4cc codes for the metadata formats when they really
> are different. There are plenty of possible 4cc codes we can use. :-)
> 
> Documenting the formats might be painful though.

The advantage of this approach together with a separate video node / 
buffer queue is, that no changes to the core would be required.

At the moment I think, that using (extended) controls would be the most 
"correct" way to implement that metadata, but you can associate such 
control values with frames only when the request API is there. Yet another 
caveat is, that we define V4L2_CTRL_ID2CLASS() as ((id) & 0x0fff0000UL)
and V4L2_CID_PRIVATE_BASE as 0x08000000, so that drivers cannot define 
private controls to belong to existing classes. Was this intensional?

Thanks
Guennadi

> > > Comments extremely welcome.
> > 
> > What I like about the request framework is that the driver can pick apart
> > the metadata and turn it into well-defined controls. So the knowledge how
> > to do that is in the place where it belongs. In cases where the meta data
> > is simple too large for that to be feasible, then I don't have much of an
> > opinion. Camera + version could be enough. Although the same can just as
> > easily be encoded as a fourcc (V4L2_META_FMT_OVXXXX_V1, _V2, etc). A fourcc
> > is more consistent with the current API.
> 
> -- 
> Kind regards,
> 
> Sakari Ailus
> sakari.ailus@xxxxxxxxxxxxxxx
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]
  Powered by Linux