Re: [ANN v2] Complex Camera Workshop - Tokyo - Jun, 19

Tomasz Figa <tfiga@xxxxxxxxxxxx> · Wed, 6 Jun 2018 13:19:39 +0900

On Mon, Jun 4, 2018 at 10:33 PM Mauro Carvalho Chehab
<mchehab+samsung@xxxxxxxxxx> wrote:
>
> Hi all,
>
> I consolidated hopefully all comments I receive on the past announcement
> with regards to the complex camera workshop we're planning to happen in
> Tokyo, just before the Open Source Summit in Japan.
>
> The main focus of the workshop is to allow supporting devices with MC-based
> hardware connected to a camera.
>
> I'm enclosing a detailed description of the problem, in order to
> allow the interested parties to be at the same page.
>
> We need to work towards an agenda for the meeting.
>
> From my side, I think we should have at least the following topics at
> the agenda:
>
> - a quick review about what's currently at libv4l2;
> - a presentation about PipeWire solution;
> - a discussion with the requirements for the new solution;
> - a discussion about how we'll address - who will do what.

I believe Intel's Jian Xu would be able to give us some brief
introduction to IPU3 hardware architecture and possibly also upcoming
hardware generations as well.

My experience with existing generations of ISPs from other vendors is
that the main principles of operation are very similar to the model
represented by IPU3 and very much different to the OMAP3 example
mentioned by Mauro below. I further commented on it below.

>
> Comments? Suggestions?
>
> Are there anyone else planning to either be there physically or via
> Google Hangouts?
>
> Tomaz,
>
> Do you have any limit about the number of people that could join us
> via Google Hangouts?
>

Technically, Hangouts should be able to work with really huge
multi-party conferences. There is obviously some limitation on client
side, since thumbnails of participants need to be decoded at real
time, so even if the resolution is low, if the client is very slow,
there might be some really bad frame drop happening on client side.

However, I often have meetings with around 8 parties and it tends to
work fine. We can also disable video of all participants, who don't
need to present anything at the moment and the problem would go away
completely.

>
> Regards,
> Mauro
>
> ---
>
> 1. Introduction
> ===============
>
> 1.1 V4L2 Kernel aspects
> -----------------------
>
> The media subsystem supports two types of devices:
>
> - "traditional" media hardware, supported via V4L2 API. On such hardware,
>   opening a single device node (usually /dev/video0) is enough to control
>   the entire device. We call it as devnode-based devices.
>   An application sometimes may need to use multiple video nodes with
>   devnode-based drivers to capture multiple streams in parallel
>   (when the hardware allows it of course). That's quite common for
>   Analog TV devices, where both /dev/video0 and /dev/vbi0 are opened
>   at the same time.
>
> - Media-controller based devices. On those devices, there are typically
>   several /dev/video? nodes and several /dev/v4l2-subdev? nodes, plus
>   a media controller device node (usually /dev/media0).
>   We call it as mc-based devices. Controlling the hardware require
>   opening the media device (/dev/media0), setup the pipeline and adjust
>   the sub-devices via /dev/v4l2-subdev?. Only streaming is controlled
>   by /dev/video?.
>
> In other words, both configuration and streaming go through the video
> device node on devnode-based drivers, while video device nodes are used
> used for streaming on mc-based drivers.
>
> With devnode-based drivers, "standard" media applications, including open
> source ones (Camorama, Cheese, Xawtv, Firefox, Chromium, ...) and closed
> source ones (Skype, Chrome, ...) support devnode-based devices[1]. Also,
> when just one media device is connected, the streaming/control device
> is typically /dev/video0.
>
> [1] It should be noticed that closed-source applications tend to have
> various bugs that prevent them from working properly on many devnode-based
> devices. Due to that, some additional blocks were requred at libv4l to
> support some of them. Skype is a good example, as we had to include a
> software scaler in libv4l to make it happy. So in practice not everything
> works smoothly with closed-source applications with devnode-based drivers.
> A few such adjustments were also made on some drivers and/or libv4l, in
> order to fulfill some open-source app requirements.
>
> Support for mc-based devices currently require an specialized application
> in order to prepare the device for its usage (setup pipelines, adjust
> hardware controls, etc). Once pipeline is set, the streaming goes via
> /dev/video?, although usually some /dev/v4l2-subdev? devnodes should also
> be opened, in order to implement algorithms designed to make video quality
> reasonable.

To further complicate the problem, on many modern imaging subsystems
(Intel IPU3, Rockchip RKISP1), there is more than 1 video output
(CAPTURE device), for example:
1) full resolution capture stream and
2) downscaled preview stream.

Moreover, many ISPs also produce per-frame metadata (statistics) for
3A algorithms, which then produces per-frame metadata (parameters) for
processing of next frame. These would be also exposed as /dev/video?
nodes with respective V4L2_BUF_TYPE_META_* queues.

It is complicated even more on systems with separate input (e.g. CSI2)
and processing (ISP) hardware, such as Intel IPU3. In such case, the
raw frames captured from the CSI2 interface directly are not usable
for end-user applications. This means that some component in userspace
needs to forward the raw frames to the ISP and only the output of the
ISP can be passed to the application.

> On such devices, it is not uncommon that the device used by the
> application to be a random number (on OMAP3 driver, typically, is either
> /dev/video4 or /dev/video6).
>
> One example of such hardware is at the OMAP3-based hardware:
>
>         http://www.infradead.org/~mchehab/mc-next-gen/omap3-igepv2-with-tvp5150.png
>
> On the picture, there's a graph with the hardware blocks in blue/dark/blue
> and the corresponding devnode interfaces in yellow.
>
> The mc-based approach was taken when support for Nokia N9/N900 cameras
> was added (with has OMAP3 SoC). It is required because the camera hardware
> on SoC comes with a media processor (ISP), with does a lot more than just
> capturing, allowing complex algorithms to enhance image quality in runtime.
> Those algorithms are known as 3A - an acronym for 3 other acronyms:
>
>         - AE (Auto Exposure);
>         - AF (Auto Focus);
>         - AWB (Auto White Balance).
>
> The main reason that drove the MC design is that the 3A algorithms (that is
> the 3A control loop, and sometimes part of the image processing itself) often
> need to run, at least partially, on the CPU. As a kernel-space implementation
> wasn't possible, we needed a lower-level UAPI.
>
> Setting a camera with such ISPs are harder because the pipelines to be
> set actually depends the requirements for those 3A algorithms to run.
> Also, usually, the 3A algorithms use some chipset-specific userspace API,
> that exports some image properties, calculated by the ISP, to speed up
> the convergence of those algorithms.
>
> Btw, usually, the 3A algorithms are IP-protected, provided by vendors
> as binary only blobs, although there are a few OSS implementations.
>
> Part of the problem is that, so far, there isn't a proper userspace API
> to implement 3A libraries. Once we have an userspace camera stack, we
> hope that we'll gradually increase the number and quality of open-source
> 3A stacks.
>
[snip]
>
> 2.2 Modern hardware is starting to come with "complex" camera ISP
> -----------------------------------------------------------------
>
> While mc-based devices were limited to SoC, it was easy to
> "delegate" the task of talking with the hardware to the
> embedded hardware designers.
>
> However, this is changing. Dell Latitude 5285 laptop is a standard
> PC with an i3-core, i5-core or i7-core CPU, with comes with the
> Intel IMU3 ISP hardware[2].

IPU3 :)

>
> [2] https://www.spinics.net/lists/linux-usb/msg167478.html
>
> There, instead of an USB camera, the hardware is equipped with a
> MC-based ISP, connected to its camera. Currently, despite having
> a Kernel driver for it, the camera doesn't work with any
> userspace application.
>
> I'm also aware of other projects that are considering the usage of
> mc-based devices for non-dedicated hardware.
>
[snip]
>
> 3.2 libv4l2 support for 3A algorithms
> =====================================
>
> The 3A algorithm handing is highly dependent on the hardware. The
> idea here is to allow libv4l to have a set of 3A algorithms that
> will be specific to certain mc-based hardware.
>
> One requirement, if we want vendor stacks to use our solution, is that
> it should allow allow external closed-source algorithms to run as well.
>
> The 3A library API must be standardized, to allow the closed-source
> vendor implementation to be replaced by an open-source implementation
> should someone have the time and energy (and qualifications) to write
> one.
>
> Sandboxed execution of the 3A library must be possible as closed-source
> can't always be blindly trusted. This includes the ability to wrap the
> library in a daemon should the platform's multimedia stack wishes
> and to avoid any direct access to the kernel devices by the 3A library
> itself (all accesses should be marshaled by the camera stack).
>
> Please note that this daemon is *not* a camera daemon that would
> communicates with the V4L2 driver through a custom back channel.
>
> The decision to run the 3A library in a sandboxed process or to call
> it directly from the camera stack should be left to the camera stack
> and to the platform integrator, and should not be visible by the 3A
> library.
>
> The 3A library must be usable on major Linux-based camera stacks (the
> Android and Chrome OS camera HALs are certainly important targets,
> more can be added) unmodified, which will allow usage of the vendor
> binary provided for Chrome OS or Android on regular Linux systems.

This is quite an interesting idea and it would be really useful if it
could be done. I'm kind of worried, though, about Android in
particular, since the execution environment in Android differs
significantly from a regular Linux distributions (including Chrome OS,
which is not so far from such), namely:
- different libc (bionic) and dynamic linker - I guess this could be
solved by static linking?
- dedicated toolchains - perhaps not much of a problem if the per-arch
ABI is the same?

>
> It would make sense to design a modular camera stack, and try to make
> most components as platform-independent as possible. This should include:
>
> - the kernel drivers (V4L2-compliant and usable without any closed-source
>   userspace component);
> - the 3A library
> - any other component that could be shared (for instance a possible
>   request API library).
>
> The rest of the code will mostly be glue around those components to
> integrate them in a particular camera stack, and should be as
> platform-agnostic as possible.
>
> In the case of the Android camera HAL, ideally it would be a glue that
> could be used with different camera vendors (probably with some kind of
> vendor-specific configuration, or possibly with a separate vendor-specific
> component to handle pipeline configuration).
>
> 4 Complex camera workshop
> =========================
>
> The workshop will be happening in Tokyo, Japan, at Jun, 19, at the
> google offices. The location is:
>
> 〒106-6126 Tokyo, Minato, Roppongi, 6 Chome−10−1 Roppongi Hills Mori Tower 44F

Nearest station exits:
- Hibiya line Roppongi station exit 1c (recommended)
- Oedo line Roppongi station exit 3 (and few minutes walk)

>
> 4.1 Physical Attendees
> ======================
>
> Tomasz Figa <tfiga@xxxxxxxxxx>
> Mauro Carvalho Chehab <Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
> Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx>
> Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx>
> Niklas Söderlund <niklas.soderlund@xxxxxxxxxxxx>
> Zheng, Jian Xu Zheng <jian.xu.zheng@xxxxxxxxx>
>
> Anywone else?

Looking at latest reply in this thread:

jacopo mondi <jacopo@xxxxxxxxxx>

Anyone else, please tell me beforehand (at least 1-2 days before), as
I need to take care of building access, since it's a multi-tenant
office building. I'll contact each attendee separately with further
details by email.

>
> 4.2. Attendees Via Google Hangouts
> ==================================
>
> Hans Verkuil <hverkuil@xxxxxxxxx> - Via Google Hangouts - maybe only on afternoon
> Javier Martinez Canillas <javier@xxxxxxxxxxxx> - Via Google Hangouts - only on reasonable TZ-compatible-hours

What time zone would that be? I guess we could try to tweak the agenda
to take this into account.

Best regards.
Tomasz