Hi all,
This is an update to the original Frame format descriptors RFC I posted
back in February last year:
<URL:http://www.spinics.net/lists/linux-media/msg44629.html>
Since that limited frame format descriptor support has been added to
mainline, supporting a subset of potential use cases:
<URL:http://www.spinics.net/lists/linux-media/msg53790.html>
I believe that such a simple get/set-type type interface isn't generic
enough for manipulating frame descriptors; a more expressive interface
is required. This RFC does not address does not address setting frame
descriptors: most devices still do not allow changing the frame
descriptor and the behaviour of the get operation should not be
dependent on how the descriptor is constructed.
Background
==========
I want to first begin by listing known use cases. There are a number of
variations of these use cases that would be nice to be supported. It
depends not only on the sensor but also on the receiver driver i.e. how
it is able to handle the data it receives.
1. Sensor metadata. Sensors produce interesting kinds of metadata.
Typically the metadata format is very hardware specific. It is known the
metadata can consist e.g. register values or floating point numbers
describing sensor state. The metadata may have certain length or it can
span a few lines at the beginning or the end of the frame, or both.
2. JPEG images or other compressed data. JPEG images are produced by
some sensors either separately or combined with the regular image data
frame. The data type is always octets for these formats.
2.1. Compressed image with defined width and height for the benefit of
receivers that do not support variable size (or JPEG) images.
3. Interleaved YUV and JPEG data. Separating the two may only done in
software, so the driver has no option but to consider both as blobs.
4. Regular image data frames. Described by struct v4l2_mbus_framefmt
already.
5. Multi-format images. See the end of the message for more information.
Some busses such as the CSI-2 are able to transport some of this on
separate channels. This provides logical separation of different parts
of the frame while still sharing the same physical bus. However most
sensors are known to send the metadata on the same channel as the
regular image data frame; this could be related to limitations on some
CSI-2 receiver implementations.
It should be thus not assumed that even if a given bus provides logical
separation between different parts of the image that feature was
actually used: instead, the width and height fields are to be used for
this purpose: the entries are in the structure in the same order as sent
by the sensor. There must be no overlap or redundancy in the descriptors.
The frame descriptor may change as a result of an action performed by
the user, such as changing the pad format from the user space. The frame
format must be thus queried from the transmitting device before starting
streamin. Changing frame descriptor while streaming is not allowed.
This leads me to think we need two relatively independent things: to
describe frame format and provide ways to provide the non-image part of
the frame to user space.
Most of the time it's possible to use the hardware to separate the
different parts of the buffer e.g. into separate memory areas or into
separate planes of a multi-plane buffer, but not quite always (the case
we don't care about).
There are currently two ways to do this: either a separate video node or
a multi-plane buffer. Neither seems entirely satisfactory: the
multi-plane buffer is only available to the user space once the last
part of it is done. On the other hand, separate video nodes cause the
need to create new video nodes based on what kind of data is produced by
a sensor, possibly relating to its configuration.
Frame format descriptor
=======================
The frame format descriptor describes the layout of the frame, not only
the image data but also other parts of it. What struct
v4l2_mbus_framefmt describes is part of it. Changes to
v4l2_mbus_framefmt affect the frame format descriptor rather than the
other way around.
struct v4l2_mbus_frame_desc {
struct v4l2_mbus_frame_desc_entry \
entry[V4L2_MBUS_FRAME_DESC_ENTRY_MAX];
unsigned short num_entries;
};
#define V4L2_MBUS_FRAME_DESC_ENTRY_FLAG_BLOB (1 << 0)
#define V4L2_MBUS_FRAME_DESC_ENTRY_FLAG_LEN_IS_MAX (1 << 1)
enum {
V4L2_MBUS_FRAME_DESC_TYPE_CSI2,
V4L2_MBUS_FRAME_DESC_TYPE_CCP2,
V4L2_MBUS_FRAME_DESC_TYPE_PARALLEL,
};
struct v4l2_mbus_frame_desc_entry {
u8 bpp;
u16 flags;
u32 pixelcode;
union {
struct {
u16 width;
u16 height;
u16 start_line;
};
u32 length; /* if BLOB flag is set */
};
unsigned int type;
union {
struct v4l2_mbus_frame_desc_entry_csi2 csi2;
struct v4l2_mbus_frame_desc_entry_ccp2 ccp2;
struct v4l2_mbus_frame_desc_entry_parallel par;
};
};
struct v4l2_mbus_frame_desc_entry_csi2 {
u8 channel;
};
struct v4l2_mbus_frame_desc_entry_ccp2 {
};
struct v4l2_mbus_frame_desc_entry_parallel {
};
The frame format is defined by the sensor, and the sensor provides a
subdev pad op to obtain the frame format. This op is used by the csi-2
receiver driver.
Width and height are the width and height of the actual raw data sent over
the bus. Compressed formats that are transferred as a raw 8-bit image have
width and height that are different from the actual width and height of the
image.
Non-image data (metadata or other blobs)
========================================
There are several ways to pass non-image data to user space. Often the
receiver is able to write the metadata to a different memory location
than the image data whereas sometimes the receiver isn't able to
separate the two. Separating the two has one important benefit: the
metadata is available for the user space automatic exposure algorithm as
soon as it has been written to system memory. We have two cases:
1. Metadata part of the same buffer (receiver unable to separate the
two). The receiver uses multi-plane buffer type. Multi-plane buffer's
each plane should have independent pixelcode field: the sensor metadata
formats are highly sensor dependent whereas the image formats are not.
2. Non-videodata arrives through a separate buffer queue. The user may
activate the link to second video node to activate metadata capture.
Multiple buffer queues should be supported in this case per video node for
capture, a topic for another RFC.
Then, how does the user decide which one to choose when the sensor
driver would be able to separate the two but the user might not want
that? The user might also want to just not capture the metadata in the
first place, even if the sensor produced it.
Multi-format image frames
=========================
This is actually another use case. I separated the further description
from the others since this topic could warrant an RFC on its own.
Some sensors are able to produce snapshots (downscaled versions of the
same frames) when capturing still photos. This kind of sensors are
typically used in conjunction with simple receivers without ISP.
How to control this feeature? The link between the sensor and the
receiver models both the physical connection and the properties of the
images produced at one end and consumed in the other.
One option would be to add one layer of abstraction and provide multiple
v4l2_mbus_framefmt's in user space. It'd also be necessary to provide
enumeration support for them as well as a way to enable and disable them
should the hardware allow it. An alternative idea could be to use multiple
links for the purpose, but that would not match with the idea of a link as
as a physical connection.
Questions and comments are most welcome.
--
Kind regards,
Sakari Ailus
e-mail: sakari.ailus@xxxxxx XMPP: sailus@xxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html