RFCv2.1: Media controller proposal

Hans Verkuil <hverkuil@xxxxxxxxx> · Sun, 13 Sep 2009 12:35:13 +0200

RFC: Media controller proposal

Version 2.1

Changes from 2.0
----------------

- Forgot to mention Hans de Goede who was part of the recent meeting between
Laurent, Guennadi and myself. My apologies!

- Removed confusing and bogus VIDIOC_S_FMT example.

- Mention the importance of documentation, also for private ioctls.

- Added open issue #7: monitoring.

- Added open issue #8: make the media controller a separate device independent
of V4L.

- Added open issue #9: sysfs vs ioctl.

- Added a link to my initial implementation.

- Start using the term 'pad' as a generic term to describe inputs and outputs.

Background
==========

This RFC is a new version of the original RFC that was written in cooperation
with and on behalf of Texas Instruments about a year ago.

Much work has been done in the past year to put the foundation in place to
be able to implement a media controller and now it is time for this updated
version. The intention is to discuss this in more detail during this years
Plumbers Conference.

Although the high-level concepts are the same as in the original RFC, many
of the details have changed based on what was learned over the past year.

This RFC is based on the original discussions with Manjunath Hadli from TI
last year, on discussions during a recent meeting between Laurent Pinchart,
Hans de Goede, Guennadi Liakhovetski and myself, and on recent discussions
with Nokia. Thanks to Sakari Ailus for doing an initial review of this RFC.

Two notes regarding terminology: a 'board' is the name I use for the SoC,
PCI or USB device that contains the video hardware. Each board has its own
driver instance and its own v4l2_device struct. Originally I called it
'device', but that name is already used in too many places.

A 'pad' is the name for an input or output of some component. I borrowed
this name from gstreamer since I realized that we needed a single term that
describes both an input and an output. Another commonly used term for that
is 'pin', but I decided against that since it is already used to describe
the actual physical hardware pins on a chip or connector. A 'pad' is more
abstract. Data is flowing from a 'source pad' or 'output pad' to a 'sink pad'
or 'input pad'.

What is a media controller?
===========================

In a nutshell: a media controller is a new v4l device node that can be used
to discover and modify the topology of the board and to give access to the 
low-level nodes (such as previewers, resizers, color space converters, etc.)
that are part of the topology.

It does not do any streaming, that is the exclusive domain of video nodes.
It is meant purely for controlling a board as a whole.

It makes very few assumptions on the underlying functionality, it basically
just enumerates components and the links between them and provides a way
of accessing those components.

An initial example implementation is available here:

http://www.linuxtv.org/hg/~hverkuil/v4l-dvb-mc

Why do we need one?
===================

There are currently several problems that are impossible to solve within the
current V4L2 API:

1) Discovering the various device nodes that are typically created by a video
board, such as: video nodes, vbi nodes, dvb nodes, alsa nodes, framebuffer
nodes, input nodes (for e.g. webcam button events or IR remotes).

It would be very handy if an application can just open an /dev/v4l/mc0 node
and be able to figure out where all the nodes are, and to be able to figure
out what the capabilities of the board are (e.g. does it support DVB, is the
audio going through a loopback cable or is there an alsa device, can it do
compressed MPEG video, etc. etc.). Currently the end-user has no choice but to
supply the device nodes manually.

2) Some of the newer SoC devices can connect or disconnect internal components
dynamically. As an example, the omap3 can either connect a sensor output to a
CCDC module to a previewer module to a resizer module and finally to a capture
device node. But it is also possible to capture the sensor output directly
after the CCDC module. The previewer can get its input from another video
device node and output either to the resizer or to another video capture
device node. The same is true for the resizer, that too can get its input from
a device node.

So there are lots of connections here that can be modified at will depending
on what the application wants. And in real life there are even more links than
I mentioned here. And it will only get more complicated in the future.

All this requires that there has to be a way to connect and disconnect parts
of the internal topology of a video board at will.

3) There is an increasing demand to be able to control e.g. sensors or video
encoders/decoders at a much more precise manner. Currently the V4L2 API
provides only limited support in the form of a set of controls. But when
building a high-end camera the developer of the application controlling it
needs very detailed control of the sensor and image processing devices.
On the other hand, you do not want to have all this polluting the V4L2 API
since there is absolutely no sense in exporting this as part of the existing
controls, or to allow for a large number of private ioctls.

What would be a good solution is to give access to the various components of
the board and allow the application to send component-specific ioctls or
controls to it. Any application that will do this is by default tailored to
that board. In addition, none of these new controls or commands will pollute
the namespace of V4L2.

A media controller can solve all these problems: it will provide a window into
the architecture of the board and all its device nodes. Since it is already
enumerating the nodes and components of the board and how they are linked up,
it is only a small step to also use it to change links and to send commands to
specific components.

Restrictions
============

1) These API additions should not affect existing applications.

2) The new API should not attempt to be too smart. All it should do it to give
the application full control of the board and to provide some initial support
for existing applications. E.g. in the case of omap3 you will have an initial
setup where the sensor is connected through all components to a capture device
node. This will provide sufficient support for a standard webcam application,
but if you want something more advanced then the application will have to set
it up explicitly. It may even be too complicated to use the resizer in this
case, and instead only a few resolutions optimal for the sensor are reported.

3) Provide automatic media controller support for drivers that do not create
one themselves. This new functionality should become available to all drivers,
not just new ones. Otherwise it will take a long time before applications like
MythTV will start to use it.

4) All APIs specific to sub-devices MUST BE DOCUMENTED! And it should NEVER be
allowed that those APIs can be abused to do direct register reads/writes!
Everyone should be able to use those APIs, not just those with access to
datasheets.

Implementation
==============

Many of the building blocks needed to implement a media controller already
exist: the v4l core can easily be extended with a media controller type, the
media controller device node can be held by the v4l2_device top-level struct,
and to represent an internal component we have the v4l2_subdev struct.

The v4l2_subdev struct can be extended with a mc_ioctl callback that can be
used by the media controller to pass custom ioctls to the subdev. There is
also a 'core ops' ioctl, but that is better left to internal usage only.

What is missing is that device nodes should be registered with struct
v4l2_device. All that is needed to do that is to ensure that when registering
a video node you always pass a pointer to the v4l2_device struct. A lot of
drivers do this already. In addition one should also be able to register
non-video device nodes (alsa, fb, etc.), so that they can be enumerated.

Since sub-devices are already registered with the v4l2_device there is not
much to do there.

Topology
--------

The topology is represented by entities. Each entity has 0 or more inputs and
0 or more outputs. Each input or output can be linked to 0 or more possible
outputs or inputs from other entities. This is either mutually exclusive 
(i.e. an input/output can be connected to only one output/input at a time)
or it can be connected to multiple inputs/outputs at the same time.

A device node is a special kind of entity with just one input (video capture)
or output (video display). It may have both if it does some in-place operation.

Each entity has a unique numerical ID (unique for the board). Each input or
output has a unique numerical ID as well, but that ID is only unique to the
entity. To specify a particular input or output of an entity one would give
an <entity ID, input/output ID> tuple. Note: to simplify the implementation
it is a good idea to encode this into a u32. The top 20 bits for the entity
ID, the bottom 12 bits for the pad ID.

When enumerating over entities you will need to retrieve at least the
following information:

- type (subdev or device node)
- entity ID
- entity name (a short name)
- entity description (can be quite long)
- subtype (what sort of device node or subdev is it?)
- capabilities (what can the entity do? Specific to the subtype and more
precise than the v4l2_capability struct which only deals with the board
capabilities)
- addition subtype-specific data (union)
- number of inputs and outputs. The input IDs should probably just be a value
of 0 - (#inputs - 1) (ditto for output IDs). Note: it looks like it might
be more efficient to just have the number of pads, and use the capabilities
field to determine whether this entity is a source and/or sink.

Another ioctl is needed to obtain the list of pads and the possible links that
can be made from/to each pad for an entity. That way two ioctl calls should be
sufficient to get all the information about an entity and how it can be hooked
up.

It is good to realize that most applications will just enumerate e.g. capture
device nodes. Few applications will do a full scan of the whole topology.
Instead they will just specify the unique entity ID and if needed the
input/output ID as well. These IDs are declared in the board or sub-device
specific header.

A full enumeration will typically only be done by some sort of generic
application like v4l2-ctl.

In addition, most entities will have only one or two inputs/outputs at most.
So we might optimize the data structures for this. We probably will have to
see how it goes when we implement it. Note: based on the current test
implementation such an optimization would be a good idea.

We obviously need ioctls to make and break links between entities. It
shouldn't be hard to do this.

Access to sub-devices
---------------------

What is a bit trickier is how to select a sub-device as the target for ioctls.
Normally ioctls like S_CTRL are sent to a /dev/v4l/videoX node and the driver
will figure out which sub-device (or possibly the bridge itself) will receive
it. There is no way of hijacking this mechanism to e.g. specify a specific
entity ID without also having to modify most of the v4l2 structs by adding
such an ID field. But with the media controller we can at least create an
ioctl that specifies a 'target entity' that will receive any non-media
controller ioctl. Note that for now we only support sub-devices as the target
entity.

The idea is this:

mc = open("/dev/v4l/mc0", O_RDWR);
// Select a particular target entity
ioctl(mc, VIDIOC_S_ENTITY, &histogramID);
// Send a custom ioctl to that entity
ioctl(mc, VIDIOC_OMAP3_G_HISTOGRAM, &hist);

This requires no API changes and is very easy to implement. One problem is
that this is not thread-safe. A good and simple solution is to store the
target entity as part of the filehandle. So you can open the media controller
multiple times and each handle can set its own target entity.

This also has the advantage that you can have a filehandle 'targeted' at a
resizer and a filehandle 'targeted' at the previewer, etc. If you want to use
the same filehandle from multiple threads, then you have to implement locking
yourself.

Open issues
===========

In no particular order:

1) How to tell the application that this board uses an audio loopback cable
to the PC's audio input?

2) There can be a lot of device nodes in complicated boards. One suggestion
is to only register them when they are linked to an entity (i.e. can be
active). Should we do this or not? Note: I fear that this will lead to quite
complex drivers. I do not plan on adding such functionality for the first
version of the media controller.

3) Format and bus configuration and enumeration. Sub-devices are connected
together by a bus. These busses can have different configurations that will
influence the list of possible formats that can be received or sent from
device nodes. This was always pretty straightforward, but if you have several
sub-devices such as scalers and colorspace converters in a pipeline then this
becomes very complex indeed. This is already a problem with soc-camera, but
that is only the tip of the iceberg.

How to solve this problem is something that requires a lot more thought.

4) One interesting idea is to create an ioctl with an entity ID as argument
that returns a timestamp of frame (audio or video) it is processing. That
would solve not only sync problems with alsa, but also when reading a stream
in general (read/write doesn't provide for a timestamp as streaming I/O does).

5) I propose that we return -ENOIOCTLCMD when an ioctl isn't supported by the
media controller. Much better than -EINVAL that is currently used in V4L2.

6) For now I think we should leave enumerating input and output connectors
to the bridge drivers (ENUMINPUT/ENUMOUTPUT). But as a future step it would
make sense to also enumerate those in the media controller. However, it is
not entirely clear what the relationship will be between that and the
existing enumeration ioctls.

7) Monitoring: the media controller can also be used very effectively to
monitor a board. Events can be passed up from sub-devices to the media
controller and through select() to the application. So a sub-device
controlling a camera motor can signal the application through the media
controller when the motor has reached its final destination. Using the
media controller for this is more appropriate than a streaming video node,
since it is not always clear to which video node an event should be sent.

Of course, some events (e.g. signalling when an MPEG decoder finished
processing the pending buffers) are clearly video node specific.

8) The media controller is currently part of the v4l2 framework. But perhaps
it should be split off as a separate media device with its own major device
number. The internal data structures are v4l2-agnostic. It would probably
require some rework to achieve this, but it would make this something that
can easily be extended to other subsystems.

9) Use of sysfs. It is no secret that I see sysfs in a supporting role only.
In my opinion sysfs is too cumbersome and simply not powerful enough compared
to ioctls. There are some cases where it can be useful to expose certain data
to sysfs as well (e.g. controls), but that should be in addition to the main
ioctl API and it should not replace it.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html