Re: Mem2Mem V4L2 devices [RFC]

"Ivan T. Ivanov" <iivanov@xxxxxxxxxx> · Fri, 02 Oct 2009 19:25:05 +0300

Hi Marek, 

On Fri, 2009-10-02 at 13:45 +0200, Marek Szyprowski wrote:
> Hello,
> 
> During the V4L2 mini-summit and the Media Controller RFC discussion on 
> Linux Plumbers 2009 Conference a mem2mem video device has been mentioned 
> a few times (usually in a context of a 'resizer device' which might be a 
> part of Camera interface pipeline or work as a standalone device). We 
> are doing a research how our custom video/multimedia drivers can fit 
> into the V4L2 framework. Most of our multimedia devices work in mem2mem 
> mode. 
> 
> I did a quick research and I found that currently in the V4L2 framework 
> there is no device that processes video data in a memory-to-memory 
> model. In terms of V4L2 framework such device would be both video sink 
> and source at the same time. The main problem is how the video nodes 
> (/dev/videoX) should be assigned to such a device. 
> 
> The simplest way of implementing mem2mem device in v4l2 framework would 
> use two video nodes (one for input and one for output). Such an idea has 
> been already suggested on V4L2 mini-summit. Each DMA engine (either 
> input or output) that is available in the hardware should get its own 
> video node. In this approach an application can write() source image to 
> for example /dev/video0 and then read the processed output from for 
> example /dev/video1. Source and destination format/params/other custom 
> settings also can be easily set for either source or destination node. 
> Besides a single image, user applications can also process video streams 
> by calling stream_on(), qbuf() + dqbuf(), stream_off() simultaneously on 
> both video nodes. 
> 
> This approach has a limitation however. As user applications would have 
> to open 2 different file descriptors to perform the processing of a 
> single image, the v4l2 driver would need to match read() calls done on 
> one file descriptor with write() calls from the another. The same thing 
> would happen with buffers enqueued with qbuf(). In practice, this would 
> result in a driver that allows only one instance of /dev/video0 as well 
> as /dev/video1 opened. Otherwise, it would not be possible to track 
> which opened /dev/video0 instance matches which /dev/video1 one. 
> 
> The real limitation of this approach is the fact, that it is hardly 
> possible to implement multi-instance support and application 
> multiplexing on a video device. In a typical embedded system, in 
> contrast to most video-source-only or video-sink-only devices, a mem2mem 
> device is very often used by more than one application at a time. Be it 
> either simple one-shot single video frame processing or stream 
> processing. Just consider that the 'resizer' module might be used in 
> many applications for scaling bitmaps (xserver video subsystem, 
> gstreamer, jpeglib, etc) only. 
> 
> At the first glance one might think that implementing multi-instance 
> support should be done in a userspace daemon instead of mem2mem drivers. 
> However I have run into problems designing such a user space daemon. 
> Usually, video buffers are passed to v4l2 device as a user pointer or 
> are mmaped directly from the device. The main issue that cannot be 
> easily resolved is passing video buffers from the client application to 
> the daemon. The daemon would queue a request on the device and return 
> results back to the client application after a transaction is finished. 
> Passing userspace pointers between an application and the daemon cannot 
> be done, as they are two different processes. Mmap-type buffers are 
> similar in this aspect - at least 2 buffer copy operations are required 
> (from client application to device input buffers mmaped in daemon's 
> memory and then from device output buffers to client application). 
> Buffer copying and process context switches add both latency and 
> additional cpu workload. In our custom drivers for mem2mem multimedia 
> devices we implemented a queue shared between all instances of an opened 
> mem2mem device. Each instance is assigned to an open device file 
> descriptor. The queue is serviced in the device context, thus maximizing 
> the device throughput. This is achieved by scheduling the next 
> transaction in the driver (kernel) context. This may not even require a 
> context switch at all. 
> 
> Do you have any ideas how would this solution fit into the current v4l2 
> design? 
> 
> Another solution that came into my mind that would not suffer from this 
> limitation is to use the same video node for both writing input buffers 
> and reading output buffers (or queuing both input and output buffers). 
> Such a design causes more problems with the current v4l2 design however: 
> 
> 1. How to set different color space or size for input and output buffer 
> each? It could be solved by adding a set of ioctls to get/set source 
> image format and size, while the existing v4l2 ioctls would only refer 
> to the output buffer. Frankly speaking, we don't like this idea. 

I think that is not unusual one video device to define that it can
support at the same time input and output operation.

Lets take as example resizer device. it is always possible that it
inform user space application that

struct v4l2_capability.capabilities ==
		(V4L2_CAP_VIDEO_CAPTURE | V4L2_CAP_VIDEO_OUTPUT)

User can issue S_FMT ioctl supplying 

struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE
		  .pix  = width x height

which will instruct this device to prepare its output for this 
resolution. after that user can issue S_FMT ioctl supplying 

struct v4l2_format.type = V4L2_BUF_TYPE_VIDEO_OUTPUT
   		  .pix  = width x height

using only these ioctls should be enough to device driver
to know down/up scale factor required. 

regarding color space struct v4l2_pix_format have field 'pixelformat'
which can be used to define input and output buffers content.
so using only existing ioctl's user can have working resizer device.

also please note that there is VIDIOC_S_CROP which can add additional
flexibility of adding cropping on input or output.

last thing which should be done is to QBUF 2 buffers and call STREAMON.

i think this will simplify a lot buffer synchronization.

iivanov

> 
> 2. Input and output in the same video node would not be compatible with 
> the upcoming media controller, with which we will get an ability to 
> arrange devices into a custom pipeline. Piping together two separate 
> input-output nodes to create a new mem2mem device would be difficult and 
> unintuitive. And that not even considering multi-output devices. 
> 
> My idea is to get back to the "2 video nodes per device" approach and 
> introduce a new ioctl for matching input and output instances of the 
> same device. When such an ioctl could be called is another question. I 
> like the idea of restricting such a call to be issued after opening 
> video nodes and before using them. Using this ioctl, a user application 
> would be able to match output instance to an input one, by matching 
> their corresponding file descriptors. 
> 
> What do you think of such a solution? 
> 
> Best regards
> --
> Marek Szyprowski
> Samsung Poland R&D Center
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html