Introduction ------------ A meeting was held from March 10-12 at the Tandberg offices in Lysaker, Norway, between developers from Nokia, Samsung, Intel and Tandberg to informally discuss any V4L2 memory handling issues. This was in preparation of a v4l-dvb mini-summit that will hopefully happen in early May in Norway (I plan to have more definitive information available next week). The main purpose was to gain a better understanding of the problems that SoC developers face when dealing with video4linux and in particular the way it handles the video buffers and the streaming API. Location: Tandberg office, Lysaker, Norway March 10-12, 2010 Attendees: Sakari Ailus (Nokia) Morten Hestnes (Tandberg) Pawel Osciak (Samsung) Laurent Pinchart (Nokia) Marek Szyprowski (Samsung) Hans Verkuil (Tandberg) Xiaolin Zhang (Intel) Summary of the meeting ---------------------- 1) Memory-to-memory devices Original thread with the proposal from Samsung: http://www.mail-archive.com/linux-samsung-soc@xxxxxxxxxxxxxxx/msg00391.html Depending on the hardware/driver there can be two methods of doing this: a) Separate capture and output video nodes. This already works, but has the limitation that only one capture and one output file handle can be streaming, just as is the case right now with capture and output nodes. b) Provide a single video node that can do both capture and output. This is what the Samsung patch provides. Here the videobuf queues are placed in the file handle struct. This allows multiplexing (must be possible to add priority queue support later, but use fifo for now). STREAMON must be called for both capture and output queue before the hardware starts processing. Such nodes will set both capture and output capabilities. Drivers will have to store the hardware configuration (e.g. capture/output formats) in the file handle struct as well, since this is now completely local to the file handle. 2) videobuf framework and the V4L2 streaming API The videobuf framework and the streaming API received a lot of attention during this meeting. Many shortcomings and bugs were found. We even discussed the idea whether a new videobuf framework should be created from scratch, but this was abandoned. The general opinion was that while there are many, many problems with videobuf, they are also relatively easy to fix. We identified the following issues in the streaming API and videobuf: a) REQBUFS with count == 0 Currently both the spec and drivers handle this inconsistently. Our proposal is this (and RFC will be posted in the near future): - Returns -EBUSY if the device is still streaming or if buffers are still mapped. - Remove the implicit STREAMOFF behavior from the spec. - Must return an error if the memory mode is not supported. - Must release all buffers otherwise. - On return the count field in the struct remains 0. b) DQBUF handling of buffer errors If a DMA error occurs while capturing a buffer the buffer status is set to VIDEOBUF_ERROR. However, in the current implementation VIDIOC_DQBUF will return -EIO and the v4l2_buffer struct will not be copied back to userspace, so while the buffer is dequeued internally, the userspace application has no idea which buffer that was. From the point of view of the application this buffer has been effectively lost. Proposed fix: - Never return an error if the buffer has been dequeued, otherwise that buffer will be lost to the application. - Need a new error flag to tell the application that the buffer had an error status. c) videobuf framework Many bugs and confusing constructs were identified. Note that proposed name changes are only that: proposals. When the actual implementation is made the name may turn out to be completely different from what is suggested here. - Fix poll behavior for output: POLLIN|POLLRDNORM is set for output devices instead of POLLOUT|POLLWRNORM as is specified in the spec. This must be fixed for mem-to-mem devices. - Run checkpatch.pl over all videobuf sources + headers and post cleanup patches. - Propose to remove the magic fields in the ops. Seems to be absolutely no reason for those. All the magic checks just pollute the code. - Rename confusing variable, function and macro names. - Add support for REQBUFS calls with count = 0. - Split videobuf_queue_cancel into cancel and free operations. Do not free the buffers on streamoff, that's really wrong. - Rename buf_setup to queue_negotiate. - Split buf_prepare() into buf_init() (only called once on first QBUF) and buf_prepare(). - Move actual buffer allocation to REQBUFS as per spec. Currently the buffers are allocated during mmap, which is unexpected. The application should be able to rely on the 'count' that REQBUFS returns. - Split buf_release() into buf_cleanup() (only called once) and queue_cancel(). - Move wait queue per buffer to a wait queue per videobuf_queue. This makes it possible to dequeue buffers in a different order from how they were queued. This functionality needed for the Samsung SoCs. It also makes more sense. There are several issues that relate to the way queue types are handled (e.g. vmalloc, dma-sg, etc). These need more discussion. Cleaning up the videobuf framework will no doubt help understanding these problems better and help in designing the proper solution. So for now consider these just ideas that need a lot more work. - Add buf_finish (counterpart to buf_prepare)? Perhaps this should go to qtype ops? - Freeing and allocating buffers is asymmetrical. Allow driver to somehow override memory allocation and freeing. - Perhaps introduce memtype ops that drivers can easily override for certain arch-specific operations? - sync() call in QBUF as well (add argument to sync to tell the direction). But should be possible to override in driver. What these items have in common is that they seem to point in the direction of creating an ops struct with low-level (platform dependent) operations for each qtype that applications can easily override. d) Avoid expensive cache operations. When dealing with large buffers the cache operations on e.g. omap3 become very expensive. One issue is that when dequeueing a buffer after capturing the cache needs to be invalidated (either completely, or 'just' for the cache lines referring to the buffer memory). Otherwise the CPU might read from the cache instead of from the DMA-ed buffer. The other issue is that when calling QBUF to pass a buffer for output DMA, the cache needs to be cleaned to ensure that all data in the cache is written to the buffer first. These operations are expensive and applications should be able to tell the queueing system to avoid this if it is not necessary. E.g. if the application just captures a buffer, never touches it and gives it to an output node, then there is no need to do any cache operations. The proposed solution is to add new flags (or perhaps just one if we can think of a suitable name): - QBUF: add flag telling the driver that the buffer is uncached. - DQBUF: add flag that the buffer won't be accessed by the CPU. (Note: any mistakes in this explanation are mine. Please correct me if I made an error here. Cache handling is not my specialty). 3) Multi-planar proposal For the Samsung patch series see: http://www.mail-archive.com/linux-media@xxxxxxxxxxxxxxx/msg16456.html The proposal was received well. A few suggestions were made. In particular the idea of allowing for meta planes (i.e. 'planes' that contain associated meta data like histogram information) and config planes (i.e. 'planes' that contain hardware configuration structures that allow reconfiguration of the hardware for each frame, mostly relevant to mem-to-mem devices) caused some changes in the proposal. - Proposed public API is basically OK. - Would be nice in the future to have a variable number of planes (for optional meta and config planes). Extra planes would have to be USERPTR (i.e. non-DMA). The fourcc code determines the number of actual DMA planes. - Add hdr_size field to tell the application where the actual video data starts as some hardware adds meta/dummy data to the front of the frame. - 'Unused' planes should probably set bytesused to 0. - Set bytesused in v4l2_buffer to 0 in the case of multi-planar. - Increase number of reserved fields. - Make the plane array fixed size (e.g. 8) internally to simplify the code. 4) Using buffers between several video nodes or drivers. One feature that is very much desired is the ability to do zero-copying between video device nodes. I.e. it should be easy to use a captured buffer as input to an output video node. Or to capture directly into an openGL texture buffer, or into a framebuffer. Currently this is tricky to do. We discussed more generic solutions. No definitive conclusions were reached and the notes below are basically just brainstorming ideas and nothing more. - Add pool memory mode (in addition to MMAP/USERPTR/OVERLAY). - Pool index will refer to a memory context struct that keeps track of the state of that memory. - Do we need a multipool? - Add ALLOC/FREE_BUF ioctls to allocate/free buffers? - Add (UN)REGISTER_BUF ioctls to (un)register 'external' buffers like framebuffers or openGL texture buffers? - Loads of locking/resource tracking things to figure out. - Global videodev pool? Per device pool? Allow both? We probably should start with a per-device pool. - How to connect to X/openGL/others? Look at existing APIs: VA API, VDPAU, TTM. - Some devices (Samsung) have different memory requirements (alignment, memory bank, location in the memory bank) depending on the device node and the format. This implies that ALLOC_BUF has to be called on the video device node. Perhaps also allow it on the media controller for non-Samsung devices. - Intel is looking at using TTM for video buffers. We need to stay in touch and see what comes out of that effort. 5) Media controller progress We discussed the media controller progress as well: - Working, but still missing locking and not yet using mbus_fmt. - struct mbus_framefmt might need more work with regards to padding/cropping - Proposed roadmap: 1) Submit subdev device nodes: must be optional per subdev driver. 2a) Event support in subdevs 2b) Control framework 3) Media controller: first just enumeration. Later add link management. 4) Public mbus ioctl API for subdevs. Action points ============= - Nokia: continue with Media Controller - Samsung: videobuf cleanup RFC: REQBUFS count == 0 RFC: DQBUF error frames RFC: cache flags RFC: mem-to-mem RFC: multi-planar - Hans: control handling framework - Intel: keep us informed on the TTM work keep us informed on Moorestown V4L2 work (code preview?) Regards, Hans Verkuil -- Hans Verkuil - video4linux developer - sponsored by TANDBERG -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html