Mem2Mem V4L2 devices [RFC]

Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> · Fri, 02 Oct 2009 13:45:13 +0200

Hello,

During the V4L2 mini-summit and the Media Controller RFC discussion on 
Linux Plumbers 2009 Conference a mem2mem video device has been mentioned 
a few times (usually in a context of a 'resizer device' which might be a 
part of Camera interface pipeline or work as a standalone device). We 
are doing a research how our custom video/multimedia drivers can fit 
into the V4L2 framework. Most of our multimedia devices work in mem2mem 
mode. 

I did a quick research and I found that currently in the V4L2 framework 
there is no device that processes video data in a memory-to-memory 
model. In terms of V4L2 framework such device would be both video sink 
and source at the same time. The main problem is how the video nodes 
(/dev/videoX) should be assigned to such a device. 

The simplest way of implementing mem2mem device in v4l2 framework would 
use two video nodes (one for input and one for output). Such an idea has 
been already suggested on V4L2 mini-summit. Each DMA engine (either 
input or output) that is available in the hardware should get its own 
video node. In this approach an application can write() source image to 
for example /dev/video0 and then read the processed output from for 
example /dev/video1. Source and destination format/params/other custom 
settings also can be easily set for either source or destination node. 
Besides a single image, user applications can also process video streams 
by calling stream_on(), qbuf() + dqbuf(), stream_off() simultaneously on 
both video nodes. 

This approach has a limitation however. As user applications would have 
to open 2 different file descriptors to perform the processing of a 
single image, the v4l2 driver would need to match read() calls done on 
one file descriptor with write() calls from the another. The same thing 
would happen with buffers enqueued with qbuf(). In practice, this would 
result in a driver that allows only one instance of /dev/video0 as well 
as /dev/video1 opened. Otherwise, it would not be possible to track 
which opened /dev/video0 instance matches which /dev/video1 one. 

The real limitation of this approach is the fact, that it is hardly 
possible to implement multi-instance support and application 
multiplexing on a video device. In a typical embedded system, in 
contrast to most video-source-only or video-sink-only devices, a mem2mem 
device is very often used by more than one application at a time. Be it 
either simple one-shot single video frame processing or stream 
processing. Just consider that the 'resizer' module might be used in 
many applications for scaling bitmaps (xserver video subsystem, 
gstreamer, jpeglib, etc) only. 

At the first glance one might think that implementing multi-instance 
support should be done in a userspace daemon instead of mem2mem drivers. 
However I have run into problems designing such a user space daemon. 
Usually, video buffers are passed to v4l2 device as a user pointer or 
are mmaped directly from the device. The main issue that cannot be 
easily resolved is passing video buffers from the client application to 
the daemon. The daemon would queue a request on the device and return 
results back to the client application after a transaction is finished. 
Passing userspace pointers between an application and the daemon cannot 
be done, as they are two different processes. Mmap-type buffers are 
similar in this aspect - at least 2 buffer copy operations are required 
(from client application to device input buffers mmaped in daemon's 
memory and then from device output buffers to client application). 
Buffer copying and process context switches add both latency and 
additional cpu workload. In our custom drivers for mem2mem multimedia 
devices we implemented a queue shared between all instances of an opened 
mem2mem device. Each instance is assigned to an open device file 
descriptor. The queue is serviced in the device context, thus maximizing 
the device throughput. This is achieved by scheduling the next 
transaction in the driver (kernel) context. This may not even require a 
context switch at all. 

Do you have any ideas how would this solution fit into the current v4l2 
design? 

Another solution that came into my mind that would not suffer from this 
limitation is to use the same video node for both writing input buffers 
and reading output buffers (or queuing both input and output buffers). 
Such a design causes more problems with the current v4l2 design however: 

1. How to set different color space or size for input and output buffer 
each? It could be solved by adding a set of ioctls to get/set source 
image format and size, while the existing v4l2 ioctls would only refer 
to the output buffer. Frankly speaking, we don't like this idea. 

2. Input and output in the same video node would not be compatible with 
the upcoming media controller, with which we will get an ability to 
arrange devices into a custom pipeline. Piping together two separate 
input-output nodes to create a new mem2mem device would be difficult and 
unintuitive. And that not even considering multi-output devices. 

My idea is to get back to the "2 video nodes per device" approach and 
introduce a new ioctl for matching input and output instances of the 
same device. When such an ioctl could be called is another question. I 
like the idea of restricting such a call to be issued after opening 
video nodes and before using them. Using this ioctl, a user application 
would be able to match output instance to an input one, by matching 
their corresponding file descriptors. 

What do you think of such a solution? 

Best regards
--
Marek Szyprowski
Samsung Poland R&D Center

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html