Re: [ANN v2] Complex Camera Workshop - Tokyo - Jun, 19

Javier Martinez Canillas <javier@xxxxxxxxxxxx> · Wed, 6 Jun 2018 10:41:50 +0200

[adding Wim Taymans and Mario Limonciello to CC who said that they may
also join via Hangous]

On Wed, Jun 6, 2018 at 6:19 AM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote:
> On Mon, Jun 4, 2018 at 10:33 PM Mauro Carvalho Chehab
> <mchehab+samsung@xxxxxxxxxx> wrote:
>>
>> Hi all,
>>
>> I consolidated hopefully all comments I receive on the past announcement
>> with regards to the complex camera workshop we're planning to happen in
>> Tokyo, just before the Open Source Summit in Japan.
>>
>> The main focus of the workshop is to allow supporting devices with MC-based
>> hardware connected to a camera.
>>
>> I'm enclosing a detailed description of the problem, in order to
>> allow the interested parties to be at the same page.
>>
>> We need to work towards an agenda for the meeting.
>>
>> From my side, I think we should have at least the following topics at
>> the agenda:
>>
>> - a quick review about what's currently at libv4l2;
>> - a presentation about PipeWire solution;

Wim mentioned that he could do this.

>> - a discussion with the requirements for the new solution;
>> - a discussion about how we'll address - who will do what.
>
> I believe Intel's Jian Xu would be able to give us some brief
> introduction to IPU3 hardware architecture and possibly also upcoming
> hardware generations as well.
>
> My experience with existing generations of ISPs from other vendors is
> that the main principles of operation are very similar to the model
> represented by IPU3 and very much different to the OMAP3 example
> mentioned by Mauro below. I further commented on it below.
>
>>
>> Comments? Suggestions?
>>
>> Are there anyone else planning to either be there physically or via
>> Google Hangouts?
>>
>> Tomaz,
>>
>> Do you have any limit about the number of people that could join us
>> via Google Hangouts?
>>
>
> Technically, Hangouts should be able to work with really huge
> multi-party conferences. There is obviously some limitation on client
> side, since thumbnails of participants need to be decoded at real
> time, so even if the resolution is low, if the client is very slow,
> there might be some really bad frame drop happening on client side.
>
> However, I often have meetings with around 8 parties and it tends to
> work fine. We can also disable video of all participants, who don't
> need to present anything at the moment and the problem would go away
> completely.
>
>>
>> Regards,
>> Mauro
>>
>> ---
>>
>> 1. Introduction
>> ===============
>>
>> 1.1 V4L2 Kernel aspects
>> -----------------------
>>
>> The media subsystem supports two types of devices:
>>
>> - "traditional" media hardware, supported via V4L2 API. On such hardware,
>>   opening a single device node (usually /dev/video0) is enough to control
>>   the entire device. We call it as devnode-based devices.
>>   An application sometimes may need to use multiple video nodes with
>>   devnode-based drivers to capture multiple streams in parallel
>>   (when the hardware allows it of course). That's quite common for
>>   Analog TV devices, where both /dev/video0 and /dev/vbi0 are opened
>>   at the same time.
>>
>> - Media-controller based devices. On those devices, there are typically
>>   several /dev/video? nodes and several /dev/v4l2-subdev? nodes, plus
>>   a media controller device node (usually /dev/media0).
>>   We call it as mc-based devices. Controlling the hardware require
>>   opening the media device (/dev/media0), setup the pipeline and adjust
>>   the sub-devices via /dev/v4l2-subdev?. Only streaming is controlled
>>   by /dev/video?.
>>
>> In other words, both configuration and streaming go through the video
>> device node on devnode-based drivers, while video device nodes are used
>> used for streaming on mc-based drivers.
>>
>> With devnode-based drivers, "standard" media applications, including open
>> source ones (Camorama, Cheese, Xawtv, Firefox, Chromium, ...) and closed
>> source ones (Skype, Chrome, ...) support devnode-based devices[1]. Also,
>> when just one media device is connected, the streaming/control device
>> is typically /dev/video0.
>>
>> [1] It should be noticed that closed-source applications tend to have
>> various bugs that prevent them from working properly on many devnode-based
>> devices. Due to that, some additional blocks were requred at libv4l to
>> support some of them. Skype is a good example, as we had to include a
>> software scaler in libv4l to make it happy. So in practice not everything
>> works smoothly with closed-source applications with devnode-based drivers.
>> A few such adjustments were also made on some drivers and/or libv4l, in
>> order to fulfill some open-source app requirements.
>>
>> Support for mc-based devices currently require an specialized application
>> in order to prepare the device for its usage (setup pipelines, adjust
>> hardware controls, etc). Once pipeline is set, the streaming goes via
>> /dev/video?, although usually some /dev/v4l2-subdev? devnodes should also
>> be opened, in order to implement algorithms designed to make video quality
>> reasonable.
>
> To further complicate the problem, on many modern imaging subsystems
> (Intel IPU3, Rockchip RKISP1), there is more than 1 video output
> (CAPTURE device), for example:
> 1) full resolution capture stream and
> 2) downscaled preview stream.
>
> Moreover, many ISPs also produce per-frame metadata (statistics) for
> 3A algorithms, which then produces per-frame metadata (parameters) for
> processing of next frame. These would be also exposed as /dev/video?
> nodes with respective V4L2_BUF_TYPE_META_* queues.
>
> It is complicated even more on systems with separate input (e.g. CSI2)
> and processing (ISP) hardware, such as Intel IPU3. In such case, the
> raw frames captured from the CSI2 interface directly are not usable
> for end-user applications. This means that some component in userspace
> needs to forward the raw frames to the ISP and only the output of the
> ISP can be passed to the application.
>
>> On such devices, it is not uncommon that the device used by the
>> application to be a random number (on OMAP3 driver, typically, is either
>> /dev/video4 or /dev/video6).
>>
>> One example of such hardware is at the OMAP3-based hardware:
>>
>>         http://www.infradead.org/~mchehab/mc-next-gen/omap3-igepv2-with-tvp5150.png
>>
>> On the picture, there's a graph with the hardware blocks in blue/dark/blue
>> and the corresponding devnode interfaces in yellow.
>>
>> The mc-based approach was taken when support for Nokia N9/N900 cameras
>> was added (with has OMAP3 SoC). It is required because the camera hardware
>> on SoC comes with a media processor (ISP), with does a lot more than just
>> capturing, allowing complex algorithms to enhance image quality in runtime.
>> Those algorithms are known as 3A - an acronym for 3 other acronyms:
>>
>>         - AE (Auto Exposure);
>>         - AF (Auto Focus);
>>         - AWB (Auto White Balance).
>>
>> The main reason that drove the MC design is that the 3A algorithms (that is
>> the 3A control loop, and sometimes part of the image processing itself) often
>> need to run, at least partially, on the CPU. As a kernel-space implementation
>> wasn't possible, we needed a lower-level UAPI.
>>
>> Setting a camera with such ISPs are harder because the pipelines to be
>> set actually depends the requirements for those 3A algorithms to run.
>> Also, usually, the 3A algorithms use some chipset-specific userspace API,
>> that exports some image properties, calculated by the ISP, to speed up
>> the convergence of those algorithms.
>>
>> Btw, usually, the 3A algorithms are IP-protected, provided by vendors
>> as binary only blobs, although there are a few OSS implementations.
>>
>> Part of the problem is that, so far, there isn't a proper userspace API
>> to implement 3A libraries. Once we have an userspace camera stack, we
>> hope that we'll gradually increase the number and quality of open-source
>> 3A stacks.
>>
> [snip]
>>
>> 2.2 Modern hardware is starting to come with "complex" camera ISP
>> -----------------------------------------------------------------
>>
>> While mc-based devices were limited to SoC, it was easy to
>> "delegate" the task of talking with the hardware to the
>> embedded hardware designers.
>>
>> However, this is changing. Dell Latitude 5285 laptop is a standard
>> PC with an i3-core, i5-core or i7-core CPU, with comes with the
>> Intel IMU3 ISP hardware[2].
>
> IPU3 :)
>
>>
>> [2] https://www.spinics.net/lists/linux-usb/msg167478.html
>>
>> There, instead of an USB camera, the hardware is equipped with a
>> MC-based ISP, connected to its camera. Currently, despite having
>> a Kernel driver for it, the camera doesn't work with any
>> userspace application.
>>
>> I'm also aware of other projects that are considering the usage of
>> mc-based devices for non-dedicated hardware.
>>
> [snip]
>>
>> 3.2 libv4l2 support for 3A algorithms
>> =====================================
>>
>> The 3A algorithm handing is highly dependent on the hardware. The
>> idea here is to allow libv4l to have a set of 3A algorithms that
>> will be specific to certain mc-based hardware.
>>
>> One requirement, if we want vendor stacks to use our solution, is that
>> it should allow allow external closed-source algorithms to run as well.
>>
>> The 3A library API must be standardized, to allow the closed-source
>> vendor implementation to be replaced by an open-source implementation
>> should someone have the time and energy (and qualifications) to write
>> one.
>>
>> Sandboxed execution of the 3A library must be possible as closed-source
>> can't always be blindly trusted. This includes the ability to wrap the
>> library in a daemon should the platform's multimedia stack wishes
>> and to avoid any direct access to the kernel devices by the 3A library
>> itself (all accesses should be marshaled by the camera stack).
>>
>> Please note that this daemon is *not* a camera daemon that would
>> communicates with the V4L2 driver through a custom back channel.
>>
>> The decision to run the 3A library in a sandboxed process or to call
>> it directly from the camera stack should be left to the camera stack
>> and to the platform integrator, and should not be visible by the 3A
>> library.
>>
>> The 3A library must be usable on major Linux-based camera stacks (the
>> Android and Chrome OS camera HALs are certainly important targets,
>> more can be added) unmodified, which will allow usage of the vendor
>> binary provided for Chrome OS or Android on regular Linux systems.
>
> This is quite an interesting idea and it would be really useful if it
> could be done. I'm kind of worried, though, about Android in
> particular, since the execution environment in Android differs
> significantly from a regular Linux distributions (including Chrome OS,
> which is not so far from such), namely:
> - different libc (bionic) and dynamic linker - I guess this could be
> solved by static linking?
> - dedicated toolchains - perhaps not much of a problem if the per-arch
> ABI is the same?
>
>>
>> It would make sense to design a modular camera stack, and try to make
>> most components as platform-independent as possible. This should include:
>>
>> - the kernel drivers (V4L2-compliant and usable without any closed-source
>>   userspace component);
>> - the 3A library
>> - any other component that could be shared (for instance a possible
>>   request API library).
>>
>> The rest of the code will mostly be glue around those components to
>> integrate them in a particular camera stack, and should be as
>> platform-agnostic as possible.
>>
>> In the case of the Android camera HAL, ideally it would be a glue that
>> could be used with different camera vendors (probably with some kind of
>> vendor-specific configuration, or possibly with a separate vendor-specific
>> component to handle pipeline configuration).
>>
>> 4 Complex camera workshop
>> =========================
>>
>> The workshop will be happening in Tokyo, Japan, at Jun, 19, at the
>> google offices. The location is:
>>
>> 〒106-6126 Tokyo, Minato, Roppongi, 6 Chome−10−1 Roppongi Hills Mori Tower 44F
>
> Nearest station exits:
> - Hibiya line Roppongi station exit 1c (recommended)
> - Oedo line Roppongi station exit 3 (and few minutes walk)
>
>>
>> 4.1 Physical Attendees
>> ======================
>>
>> Tomasz Figa <tfiga@xxxxxxxxxx>
>> Mauro Carvalho Chehab <Mauro Carvalho Chehab <mchehab+samsung@xxxxxxxxxx>
>> Kieran Bingham <kieran.bingham@xxxxxxxxxxxxxxxx>
>> Laurent Pinchart <laurent.pinchart@xxxxxxxxxxxxxxxx>
>> Niklas Söderlund <niklas.soderlund@xxxxxxxxxxxx>
>> Zheng, Jian Xu Zheng <jian.xu.zheng@xxxxxxxxx>
>>
>> Anywone else?
>
> Looking at latest reply in this thread:
>
> jacopo mondi <jacopo@xxxxxxxxxx>
>
> Anyone else, please tell me beforehand (at least 1-2 days before), as
> I need to take care of building access, since it's a multi-tenant
> office building. I'll contact each attendee separately with further
> details by email.
>
>>
>> 4.2. Attendees Via Google Hangouts
>> ==================================
>>
>> Hans Verkuil <hverkuil@xxxxxxxxx> - Via Google Hangouts - maybe only on afternoon
>> Javier Martinez Canillas <javier@xxxxxxxxxxxx> - Via Google Hangouts - only on reasonable TZ-compatible-hours
>
> What time zone would that be? I guess we could try to tweak the agenda
> to take this into account.
>

Wim, Nicolas and myself are in CEST (UTC +2). The best time for Wim to
do the PipeWire presentation would be 10:30 am CEST.

Best regards,
Javier