On Mon, Jun 4, 2018 at 10:33 PM Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx> wrote: > > Hi all, > > I consolidated hopefully all comments I receive on the past announcement > with regards to the complex camera workshop we're planning to happen in > Tokyo, just before the Open Source Summit in Japan. > > The main focus of the workshop is to allow supporting devices with MC-based > hardware connected to a camera. > > I'm enclosing a detailed description of the problem, in order to > allow the interested parties to be at the same page. > > We need to work towards an agenda for the meeting. > > From my side, I think we should have at least the following topics at > the agenda: > > - a quick review about what's currently at libv4l2; > - a presentation about PipeWire solution; > - a discussion with the requirements for the new solution; > - a discussion about how we'll address - who will do what. > > Comments? Suggestions? > > Are there anyone else planning to either be there physically or via > Google Hangouts? > > Tomaz, > > Do you have any limit about the number of people that could join us > via Google Hangouts? > > > Regards, > Mauro > > --- > > 1. Introduction > =============== > > 1.1 V4L2 Kernel aspects > ----------------------- > > The media subsystem supports two types of devices: > > - "traditional" media hardware, supported via V4L2 API. On such hardware, > opening a single device node (usually /dev/video0) is enough to control > the entire device. We call it as devnode-based devices. > An application sometimes may need to use multiple video nodes with > devnode-based drivers to capture multiple streams in parallel > (when the hardware allows it of course). That's quite common for > Analog TV devices, where both /dev/video0 and /dev/vbi0 are opened > at the same time. > > - Media-controller based devices. On those devices, there are typically > several /dev/video? nodes and several /dev/v4l2-subdev? nodes, plus > a media controller device node (usually /dev/media0). > We call it as mc-based devices. Controlling the hardware require > opening the media device (/dev/media0), setup the pipeline and adjust > the sub-devices via /dev/v4l2-subdev?. Only streaming is controlled > by /dev/video?. > > In other words, both configuration and streaming go through the video > device node on devnode-based drivers, while video device nodes are used > used for streaming on mc-based drivers. > > With devnode-based drivers, "standard" media applications, including open > source ones (Camorama, Cheese, Xawtv, Firefox, Chromium, ...) and closed > source ones (Skype, Chrome, ...) support devnode-based devices[1]. Also, > when just one media device is connected, the streaming/control device > is typically /dev/video0. > > [1] It should be noticed that closed-source applications tend to have > various bugs that prevent them from working properly on many devnode-based > devices. Due to that, some additional blocks were requred at libv4l to > support some of them. Skype is a good example, as we had to include a > software scaler in libv4l to make it happy. So in practice not everything > works smoothly with closed-source applications with devnode-based drivers. > A few such adjustments were also made on some drivers and/or libv4l, in > order to fulfill some open-source app requirements. > > Support for mc-based devices currently require an specialized application > in order to prepare the device for its usage (setup pipelines, adjust > hardware controls, etc). Once pipeline is set, the streaming goes via > /dev/video?, although usually some /dev/v4l2-subdev? devnodes should also > be opened, in order to implement algorithms designed to make video quality > reasonable. On such devices, it is not uncommon that the device used by the > application to be a random number (on OMAP3 driver, typically, is either > /dev/video4 or /dev/video6). > > One example of such hardware is at the OMAP3-based hardware: > > http://www.infradead.org/~mchehab/mc-next-gen/omap3-igepv2-with-tvp5150.png > > On the picture, there's a graph with the hardware blocks in blue/dark/blue > and the corresponding devnode interfaces in yellow. > > The mc-based approach was taken when support for Nokia N9/N900 cameras > was added (with has OMAP3 SoC). It is required because the camera hardware > on SoC comes with a media processor (ISP), with does a lot more than just > capturing, allowing complex algorithms to enhance image quality in runtime. > Those algorithms are known as 3A - an acronym for 3 other acronyms: > > - AE (Auto Exposure); > - AF (Auto Focus); > - AWB (Auto White Balance). > > The main reason that drove the MC design is that the 3A algorithms (that is > the 3A control loop, and sometimes part of the image processing itself) often > need to run, at least partially, on the CPU. As a kernel-space implementation > wasn't possible, we needed a lower-level UAPI. > > Setting a camera with such ISPs are harder because the pipelines to be > set actually depends the requirements for those 3A algorithms to run. > Also, usually, the 3A algorithms use some chipset-specific userspace API, > that exports some image properties, calculated by the ISP, to speed up > the convergence of those algorithms. > > Btw, usually, the 3A algorithms are IP-protected, provided by vendors > as binary only blobs, although there are a few OSS implementations. > > Part of the problem is that, so far, there isn't a proper userspace API > to implement 3A libraries. Once we have an userspace camera stack, we > hope that we'll gradually increase the number and quality of open-source > 3A stacks. > > > 1.2 V4L2 userspace aspects > -------------------------- > > Back when USB cameras were introduced, the hardware were really simple: > they had a CMOS or CCD camera sensor and a chip that bridges the data > though USB. Camera sensors typically provide data using a bayer > format, but they usually have their own proprietary ways to pack the data > at the USB bridges, in order to reduce the bandwidth (the first > implementations were using USB version 1.1). > > So, V4L2 has a myriad of different formats, in order to match each > camera sensor format. At the end of the day, applications were > able to use only a subset of the available hardware, since they need > to come with format converters for all formats the developer uses > (usually a very small subset of the available ones). > > That's said, newer cameras are converged towards a small set of > standard formats - except for secondary data streams (like depth > maps). Yet, industrial cameras, and newer technologies, like > 3D, light-field, etc may still bring newer formats. > > To end with this mess, an userspace library was written, called libv4l. > It supports all those proprietary formats. So, applications can use > a RGB or YUV format, without needing to concern about conversions. > > The way it works is by adding wrappers to system calls: open, close, > ioctl, mmap, mmunmap. So, a conversion to use it is really simple: > at the source code of the apps, all it was needed is to prepend the > existing calls with "v4l2_", e. g. v4l2_open, v4l2_close, etc. > > All open source apps we know now supports libv4l. On a few (like > gstreamer), support for it is optional. > > It should be noted that libv4l also handles scaling and rough auto-gain > and auto-white balance that are required by some cameras to achieve > a usable image. > > In order to support closed source, another wrapper was added, allowing > to call any closed source application to use it, by using LD_PRELOAD. > For example, using skype with it is as simple as calling it with: > > $ LD_PRELOAD=/usr/lib/libv4l/v4l1compat.so /usr/bin/skypeforlinux > > > 2. Current problems > =================== > > 2.1 Libv4l can slow image handling > ---------------------------------- > > Nowadays, almost all new "simple" cameras are connected via USB using > the UVC class (USB Video Class). UVC standardized the allowed formats, > and most apps just implement them. The UVC hardware is more complex, > having format converters inside it. So, for most usages, format > conversion isn't needed anymore. > > The need of doing format conversion in software makes libv4l slow, > requiring lots of CPU usage in order to convert a 4K or 8K format, > being even worse with 3D cameras. > > Also, due to the need of supporting LD_PRELOAD, zero-buffer copy via > DMA_BUFFER currently doesn't work with libv4l. > > Right now, gstreamer defaults to not enable libv4l2, due to several > reasons: > - Crash when CREATE_BUFS is being used; > - Crash in the jpeg decoder (when frames are corrupted); > - App exporting DMABuf need to be aware of emulation, otherwise the > DMABuf exported are in the orignal format; > - RW emulation only initialize the queue on first read (causing > userspace poll() to fail); > - Signature of v4l2_mmap does not match mmap() (minor); > - The colorimetry does not seem emulated when conversion; > - Sub-optimal locking (at least deadlocks were fixed). > > Most of the above are due to new features added to Kernel uAPI, but > not added to libv4l2. > > These issues are already worked around in GStreamer, but with the lost > of features of course. There is other cases were something worked > without libv4l2, but didn't work with libv4l2, but Gstreamer developers > haven't tracked down the cause yet. Since 1.14, libv4l2 can be enabled > at run-time using env GST_V4L2_USE_LIBV4L2=1. > > 2.2 Modern hardware is starting to come with "complex" camera ISP > ----------------------------------------------------------------- > > While mc-based devices were limited to SoC, it was easy to > "delegate" the task of talking with the hardware to the > embedded hardware designers. > > However, this is changing. Dell Latitude 5285 laptop is a standard > PC with an i3-core, i5-core or i7-core CPU, with comes with the > Intel IMU3 ISP hardware[2]. > > [2] https://www.spinics.net/lists/linux-usb/msg167478.html > > There, instead of an USB camera, the hardware is equipped with a > MC-based ISP, connected to its camera. Currently, despite having > a Kernel driver for it, the camera doesn't work with any > userspace application. > > I'm also aware of other projects that are considering the usage of > mc-based devices for non-dedicated hardware. > > 3. How to solve it? > =================== > > That's the main focus of the meeting :-) > > From a previous discussion I had with media sub-maintainers, there are > at least two actions that seem required. I'm listing them below as > an starting point for the discussions, but we can eventually come up > with some different approach after the meeting. > > 3.1 Library support for mc-based hardware > ========================================= > > In order to support those hardware, we'll need to do some redesign, > mainly at userspace, via a library[3] that will replace/compliment > libv4l2. > > The idea is to work on a library API that will allow to split the > format conversion on a separate part of it, add support > for DMA Buffer and come up with a way for the library to work > transparently with both devnode-based and mc-based hardware. > > The library should be capable of doing dynamic pipeline setups, > including the functionality of allow tranfering buffers between > different media devices, as, on several cases, a a camera input > device node should be connected to a m2m device that would be > doing some image processing, and providing output data on a > standard video output format. > > One possibility is to redesign libv4l2. It should be noticed that > Videolan people is working on a daemon that could also provide > another way to implement it[4]. > > Once we get a proper camera stack, we need to support traditional > applications that use libv4l2 and/or the new library directly. To > that end we will need to expose the libv4l2 API on top of the > camera stack (through libv4l2 if it makes sense, or through a separate > implementation) and implement a transparent LD_PRELOAD-based library. > > That envolves adding capacity to the library to setup hardware pipelines > and to propagate controls among their sub-devices. Eventually, part > of it will be done in Kernel. > > That should give performance increase at the library and would allow > gstreamer to use it by default, without compromising performance. > > [3] I don't discard that some Kernel changes could also be part of the > solution, like, for example, doing control propagation along the pipeline > on simple use case scenarios. > > [4] With the venue of sandboxed application, there is a need to control > access to cameras through a daemon. The same daemon is also used to > control access to screen capture on Wayland (instead of letting any > random application capture your screen, like on X11). The effort is > lead by the desktop team at RedHat. PipeWire already have V4L2 native > support and is integrated in GStreamer already in a way that it can > totally replace the V4L2 capture component there. PipeWire is plugin based, > so more type of camera support can be added. Remote daemon can also provide > streams, as this is the case for compositors and screen casting. An extra > benefit is that you can have multiple application reading frames from the > same camera. It also allow sandboxed application (the do not have access > to /dev) to use the cameras. In this context, proprietary or HW specific > algorithm could be implemented in userspace as PipeWire plugins, and then > application will automatically be enable to enumerate and use these. > > 3.2 libv4l2 support for 3A algorithms > ===================================== > > The 3A algorithm handing is highly dependent on the hardware. The > idea here is to allow libv4l to have a set of 3A algorithms that > will be specific to certain mc-based hardware. > > One requirement, if we want vendor stacks to use our solution, is that > it should allow allow external closed-source algorithms to run as well. > > The 3A library API must be standardized, to allow the closed-source > vendor implementation to be replaced by an open-source implementation > should someone have the time and energy (and qualifications) to write > one. > > Sandboxed execution of the 3A library must be possible as closed-source > can't always be blindly trusted. This includes the ability to wrap the > library in a daemon should the platform's multimedia stack wishes > and to avoid any direct access to the kernel devices by the 3A library > itself (all accesses should be marshaled by the camera stack). > > Please note that this daemon is *not* a camera daemon that would > communicates with the V4L2 driver through a custom back channel. > > The decision to run the 3A library in a sandboxed process or to call > it directly from the camera stack should be left to the camera stack > and to the platform integrator, and should not be visible by the 3A > library. > > The 3A library must be usable on major Linux-based camera stacks (the > Android and Chrome OS camera HALs are certainly important targets, > more can be added) unmodified, which will allow usage of the vendor > binary provided for Chrome OS or Android on regular Linux systems. > > It would make sense to design a modular camera stack, and try to make > most components as platform-independent as possible. This should include: > > - the kernel drivers (V4L2-compliant and usable without any closed-source > userspace component); > - the 3A library > - any other component that could be shared (for instance a possible > request API library). > > The rest of the code will mostly be glue around those components to > integrate them in a particular camera stack, and should be as > platform-agnostic as possible. > > In the case of the Android camera HAL, ideally it would be a glue that > could be used with different camera vendors (probably with some kind of > vendor-specific configuration, or possibly with a separate vendor-specific > component to handle pipeline configuration). > > 4 Complex camera workshop > ========================= > > The workshop will be happening in Tokyo, Japan, at Jun, 19, at the > google offices. The location is: > > 〒106-6126 Tokyo, Minato, Roppongi, 6 Chome−10−1 Roppongi Hills Mori Tower 44F > > 4.1 Physical Attendees > ====================== > > Tomasz Figa <tfiga@xxxxxxxxxx> > Mauro Carvalho Chehab <Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx> > Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx> > Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx> > Niklas Söderlund <niklas.soderlund@xxxxxxxxxxxx> > Zheng, Jian Xu Zheng <jian.xu.zheng@xxxxxxxxx> > > Anywone else? I will probably be there as well. Alexandre Courbot <acourbot@xxxxxxxxxx> Cheers, Alex. > > 4.2. Attendees Via Google Hangouts > ================================== > > Hans Verkuil <hverkuil@xxxxxxxxx> - Via Google Hangouts - maybe only on afternoon > Javier Martinez Canillas <javier@xxxxxxxxxxxx> - Via Google Hangouts - only on reasonable TZ-compatible-hours > Ricky - Google camera team in Taipei - Via Google Hangouts > > Anywone else?